I see what you mean but the way my records are it cannot happen unless I reindex it.
Le lundi 8 décembre 2014 12:05:13 UTC-8, Nikolas Everett a écrit : > > I'm not sure what is up but remember that post_ids in the script is a list > not a set. You might be growing it without bounds. > On Dec 8, 2014 2:49 PM, "Christophe Verbinnen" <[email protected] > <javascript:>> wrote: > >> Hello, >> >> We have a small cluster with 3 nodes running 1.3.6. >> >> I have an index setup with only two fields. >> >> { >> index: index_name, >> body: { >> settings: { >> number_of_shards: 3, >> store: { >> type: :mmapfs >> } >> }, >> mappings: { >> mapping_name => { >> properties: { >> :value => {type: 'string', analyzer: 'keyword'}, >> :post_ids => {type: 'long', index: 'not_analyzed'} >> } >> } >> } >> } >> } >> >> >> We are basically storing strings and all the post they are related to. >> >> The problem is that this data is not stored this way in the database so I >> don't have an id to represent each string nor do I have all the post_ids >> from the start. >> >> So I use the sha1 of the string value as id and I use and script to >> append to the post_ids. >> >> Here is my code that I use to index using the bulk api end point. >> >> def index! >> posts_ids = Post.where... >> bulk_data = [] >> strings.uniq.each do |string| >> string_id = Digest::SHA1.hexdigest string >> bulk_data << >> { >> update: >> { >> _index: 'post_strings', >> _type: 'post_string', >> _id: string_id, >> data: { >> script: "ctx._source.post_ids += additional_post_ids", >> params: { >> additional_post_ids: post_ids >> }, >> upsert: { >> value: string, >> post_ids: post_ids >> } >> } >> } >> } >> if bulk_data.count == 100 >> $elasticsearch.bulk :body => bulk_data >> bulk_data = [] >> end >> end >> $elasticsearch.bulk :body => bulk_data if bulk_data.any? >> end >> >> So this worked fine for the first 75 Million strings but It was getting >> slower and slower until it reached an indexing rate of only 50 doc per sec. >> >> After that the cluster just killed itself because the nodes couldn't take >> to each other. >> >> I'm gessing all the threads were blocked trying to index and nodes had no >> available threads to respond. >> >> At first I tought it would be related to the sha1 id being not very >> efficient but with my test with sequencial ids it was not getting better. >> >> I'm out of ideas right now. Any help would be greatly appreciated. >> >> Cheers. >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/82c27f2c-bf56-4064-80bc-b348203edcb5%40googlegroups.com >> >> <https://groups.google.com/d/msgid/elasticsearch/82c27f2c-bf56-4064-80bc-b348203edcb5%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/740ab5ce-eaef-4ae4-9a00-f50be5aa45c3%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
