Do you guys know how to plug in a custom hash function for routing 
parameter?

在 2014年3月26日星期三UTC+1下午12时51分24秒,Han JU写道:
>
> Thanks a lot Kevin.
>
> That DJB_HASH result makes it clear for us. I think we'll just use the id 
> value as hash.
> Do you guys know how to plugin a custom hash function?
>
>
> 在 2014年3月26日星期三UTC+1上午11时58分36秒,Kevin Wang写道:
>>
>> There are two hash functions 
>> implementation 
>> org.elasticsearch.cluster.routing.operation.hash.djb.DjbHashFunction 
>> and 
>> org.elasticsearch.cluster.routing.operation.hash.simple.SimpleHashFunction, 
>> default is DjbHashFunction. You can try get the hash by 
>> using DjbHashFunction.DJB_HASH(you id)
>>
>>
>>
>>
>> On Wednesday, March 26, 2014 9:49:10 PM UTC+11, Han JU wrote:
>>>
>>> Thanks for your reply.
>>>
>>> As far as I know, in Java, basic hash value of positive int/long value 
>>> is just themselves (our ids are small values like 1125, 345 etc).
>>> So I calculated some_id % 128, and I got 116 distinct values. But in 
>>> reality there's a lot less shards in use. 
>>>
>>> Does ElasticSearch use some special hash function?
>>>
>>> 在 2014年3月26日星期三UTC+1上午11时39分15秒,Kevin Wang写道:
>>>>
>>>> ES will get the shard id by hash(routing)%num of shards, in your case, 
>>>> there are only 167 distinct values but have 128 shards, I think it's 
>>>> highly 
>>>> possible there is less than 128 distinct hash values. So some of the shard 
>>>> will not have any data.
>>>>
>>>>
>>>> Kevin
>>>>
>>>> On Wednesday, March 26, 2014 9:30:36 PM UTC+11, Han JU wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> We've indexed 25M documents into a single index of 128 shards with 1 
>>>>> replica. 
>>>>> The `routing` parameter is set to a path in the document, which is an 
>>>>> int value:
>>>>>
>>>>> _routing: {
>>>>>   path: "some_id"
>>>>>   required: true
>>>>> }
>>>>>
>>>>>
>>>>> In out 25M documents, there's 167 distinct values of this "some_id" 
>>>>> and in our expectation, ElasticSearch will route these documents evenly 
>>>>> across all shards.
>>>>> But we've found out that, out of 128 shards, there are 53 empty shards 
>>>>> (with 0 document inside), or, 40% of the shards are not used at all.
>>>>>
>>>>> My question: 
>>>>>
>>>>> - is this normal? Do we miss something in configuring routing? 
>>>>> - does this imbalanced shard utilization affect indexing speed?
>>>>>
>>>>> We can confirm that all documents are correctly indexed and routing 
>>>>> works (when searching with routing only 1 shard responds with the correct 
>>>>> answer).
>>>>> ElasticSearch version is v1.0.1.
>>>>>
>>>>>  
>>>>> Thanks!
>>>>>
>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/24ea97ab-3795-4def-b284-33742e30a908%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to