Re: Shards/routing documents imbalance problem

Han JU Thu, 27 Mar 2014 03:32:49 -0700

Thanks but can you explain some detail?
Say I have the class in `MyHashFunction.java`, how could I put it in ES? I 
need to modify the code of ES or ?


在 2014年3月27日星期四UTC+1上午11时27分24秒，Kevin Wang写道：
>
> You can add a class that implements HashFunction and set the setting 
> "cluster.routing.operation.hash.type“ to that class.
>
>
> Regards,
> Kevin 
>
> On Thursday, March 27, 2014 9:11:39 PM UTC+11, Han JU wrote:
>>
>> Do you guys know how to plug in a custom hash function for routing 
>> parameter?
>>
>> 在 2014年3月26日星期三UTC+1下午12时51分24秒，Han JU写道：
>>>
>>> Thanks a lot Kevin.
>>>
>>> That DJB_HASH result makes it clear for us. I think we'll just use the 
>>> id value as hash.
>>> Do you guys know how to plugin a custom hash function?
>>>
>>>
>>> 在 2014年3月26日星期三UTC+1上午11时58分36秒，Kevin Wang写道：
>>>>
>>>> There are two hash functions 
>>>> implementation 
>>>> org.elasticsearch.cluster.routing.operation.hash.djb.DjbHashFunction 
>>>> and 
>>>> org.elasticsearch.cluster.routing.operation.hash.simple.SimpleHashFunction,
>>>>  
>>>> default is DjbHashFunction. You can try get the hash by 
>>>> using DjbHashFunction.DJB_HASH(you id)
>>>>
>>>>
>>>>
>>>>
>>>> On Wednesday, March 26, 2014 9:49:10 PM UTC+11, Han JU wrote:
>>>>>
>>>>> Thanks for your reply.
>>>>>
>>>>> As far as I know, in Java, basic hash value of positive int/long value 
>>>>> is just themselves (our ids are small values like 1125, 345 etc).
>>>>> So I calculated some_id % 128, and I got 116 distinct values. But in 
>>>>> reality there's a lot less shards in use. 
>>>>>
>>>>> Does ElasticSearch use some special hash function?
>>>>>
>>>>> 在 2014年3月26日星期三UTC+1上午11时39分15秒，Kevin Wang写道：
>>>>>>
>>>>>> ES will get the shard id by hash(routing)%num of shards, in your 
>>>>>> case, there are only 167 distinct values but have 128 shards, I think 
>>>>>> it's 
>>>>>> highly possible there is less than 128 distinct hash values. So some of 
>>>>>> the 
>>>>>> shard will not have any data.
>>>>>>
>>>>>>
>>>>>> Kevin
>>>>>>
>>>>>> On Wednesday, March 26, 2014 9:30:36 PM UTC+11, Han JU wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> We've indexed 25M documents into a single index of 128 shards with 1 
>>>>>>> replica. 
>>>>>>> The `routing` parameter is set to a path in the document, which is 
>>>>>>> an int value:
>>>>>>>
>>>>>>> _routing: {
>>>>>>>   path: "some_id"
>>>>>>>   required: true
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> In out 25M documents, there's 167 distinct values of this "some_id" 
>>>>>>> and in our expectation, ElasticSearch will route these documents evenly 
>>>>>>> across all shards.
>>>>>>> But we've found out that, out of 128 shards, there are 53 empty 
>>>>>>> shards (with 0 document inside), or, 40% of the shards are not used at 
>>>>>>> all.
>>>>>>>
>>>>>>> My question: 
>>>>>>>
>>>>>>> - is this normal? Do we miss something in configuring routing? 
>>>>>>> - does this imbalanced shard utilization affect indexing speed?
>>>>>>>
>>>>>>> We can confirm that all documents are correctly indexed and routing 
>>>>>>> works (when searching with routing only 1 shard responds with the 
>>>>>>> correct 
>>>>>>> answer).
>>>>>>> ElasticSearch version is v1.0.1.
>>>>>>>
>>>>>>>  
>>>>>>> Thanks!
>>>>>>>
>>>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c8f96c79-76c1-42eb-8380-ecc044b44a7f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Shards/routing documents imbalance problem

Reply via email to