Re: Shards/routing documents imbalance problem

Kevin Wang Thu, 27 Mar 2014 03:27:45 -0700

You can add a class that implements HashFunction and set the setting 
"cluster.routing.operation.hash.type“ to that class.



Regards,
Kevin 

On Thursday, March 27, 2014 9:11:39 PM UTC+11, Han JU wrote:
>
> Do you guys know how to plug in a custom hash function for routing 
> parameter?
>
> 在 2014年3月26日星期三UTC+1下午12时51分24秒，Han JU写道：
>>
>> Thanks a lot Kevin.
>>
>> That DJB_HASH result makes it clear for us. I think we'll just use the id 
>> value as hash.
>> Do you guys know how to plugin a custom hash function?
>>
>>
>> 在 2014年3月26日星期三UTC+1上午11时58分36秒，Kevin Wang写道：
>>>
>>> There are two hash functions 
>>> implementation 
>>> org.elasticsearch.cluster.routing.operation.hash.djb.DjbHashFunction 
>>> and 
>>> org.elasticsearch.cluster.routing.operation.hash.simple.SimpleHashFunction, 
>>> default is DjbHashFunction. You can try get the hash by 
>>> using DjbHashFunction.DJB_HASH(you id)
>>>
>>>
>>>
>>>
>>> On Wednesday, March 26, 2014 9:49:10 PM UTC+11, Han JU wrote:
>>>>
>>>> Thanks for your reply.
>>>>
>>>> As far as I know, in Java, basic hash value of positive int/long value 
>>>> is just themselves (our ids are small values like 1125, 345 etc).
>>>> So I calculated some_id % 128, and I got 116 distinct values. But in 
>>>> reality there's a lot less shards in use. 
>>>>
>>>> Does ElasticSearch use some special hash function?
>>>>
>>>> 在 2014年3月26日星期三UTC+1上午11时39分15秒，Kevin Wang写道：
>>>>>
>>>>> ES will get the shard id by hash(routing)%num of shards, in your case, 
>>>>> there are only 167 distinct values but have 128 shards, I think it's 
>>>>> highly 
>>>>> possible there is less than 128 distinct hash values. So some of the 
>>>>> shard 
>>>>> will not have any data.
>>>>>
>>>>>
>>>>> Kevin
>>>>>
>>>>> On Wednesday, March 26, 2014 9:30:36 PM UTC+11, Han JU wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> We've indexed 25M documents into a single index of 128 shards with 1 
>>>>>> replica. 
>>>>>> The `routing` parameter is set to a path in the document, which is an 
>>>>>> int value:
>>>>>>
>>>>>> _routing: {
>>>>>>   path: "some_id"
>>>>>>   required: true
>>>>>> }
>>>>>>
>>>>>>
>>>>>> In out 25M documents, there's 167 distinct values of this "some_id" 
>>>>>> and in our expectation, ElasticSearch will route these documents evenly 
>>>>>> across all shards.
>>>>>> But we've found out that, out of 128 shards, there are 53 empty 
>>>>>> shards (with 0 document inside), or, 40% of the shards are not used at 
>>>>>> all.
>>>>>>
>>>>>> My question: 
>>>>>>
>>>>>> - is this normal? Do we miss something in configuring routing? 
>>>>>> - does this imbalanced shard utilization affect indexing speed?
>>>>>>
>>>>>> We can confirm that all documents are correctly indexed and routing 
>>>>>> works (when searching with routing only 1 shard responds with the 
>>>>>> correct 
>>>>>> answer).
>>>>>> ElasticSearch version is v1.0.1.
>>>>>>
>>>>>>  
>>>>>> Thanks!
>>>>>>
>>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9817fbd7-5e75-4557-807f-276df5b3120d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Shards/routing documents imbalance problem

Reply via email to