I'm not sure what is up but remember that post_ids in the script is a list
not a set. You might be growing it without bounds.
On Dec 8, 2014 2:49 PM, "Christophe Verbinnen" <[email protected]> wrote:
> Hello,
>
> We have a small cluster with 3 nodes running 1.3.6.
>
> I have an index setup with only two fields.
>
> {
> index: index_name,
> body: {
> settings: {
> number_of_shards: 3,
> store: {
> type: :mmapfs
> }
> },
> mappings: {
> mapping_name => {
> properties: {
> :value => {type: 'string', analyzer: 'keyword'},
> :post_ids => {type: 'long', index: 'not_analyzed'}
> }
> }
> }
> }
> }
>
>
> We are basically storing strings and all the post they are related to.
>
> The problem is that this data is not stored this way in the database so I
> don't have an id to represent each string nor do I have all the post_ids
> from the start.
>
> So I use the sha1 of the string value as id and I use and script to append
> to the post_ids.
>
> Here is my code that I use to index using the bulk api end point.
>
> def index!
> posts_ids = Post.where...
> bulk_data = []
> strings.uniq.each do |string|
> string_id = Digest::SHA1.hexdigest string
> bulk_data <<
> {
> update:
> {
> _index: 'post_strings',
> _type: 'post_string',
> _id: string_id,
> data: {
> script: "ctx._source.post_ids += additional_post_ids",
> params: {
> additional_post_ids: post_ids
> },
> upsert: {
> value: string,
> post_ids: post_ids
> }
> }
> }
> }
> if bulk_data.count == 100
> $elasticsearch.bulk :body => bulk_data
> bulk_data = []
> end
> end
> $elasticsearch.bulk :body => bulk_data if bulk_data.any?
> end
>
> So this worked fine for the first 75 Million strings but It was getting
> slower and slower until it reached an indexing rate of only 50 doc per sec.
>
> After that the cluster just killed itself because the nodes couldn't take
> to each other.
>
> I'm gessing all the threads were blocked trying to index and nodes had no
> available threads to respond.
>
> At first I tought it would be related to the sha1 id being not very
> efficient but with my test with sequencial ids it was not getting better.
>
> I'm out of ideas right now. Any help would be greatly appreciated.
>
> Cheers.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/82c27f2c-bf56-4064-80bc-b348203edcb5%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/82c27f2c-bf56-4064-80bc-b348203edcb5%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0_qr%2B-jU%2BYgPiN-hA283aGgoy-UtH3j5-0wEJBCuP2Mg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.