You can add that as a plugin,
see
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-plugins.html
On Thursday, March 27, 2014 9:32:28 PM UTC+11, Han JU wrote:
>
> Thanks but can you explain some detail?
> Say I have the class in `MyHashFunction.java`, how could I put it in ES? I
> need to modify the code of ES or ?
>
> 在 2014年3月27日星期四UTC+1上午11时27分24秒,Kevin Wang写道:
>>
>> You can add a class that implements HashFunction and set the setting
>> "cluster.routing.operation.hash.type“ to that class.
>>
>>
>> Regards,
>> Kevin
>>
>> On Thursday, March 27, 2014 9:11:39 PM UTC+11, Han JU wrote:
>>>
>>> Do you guys know how to plug in a custom hash function for routing
>>> parameter?
>>>
>>> 在 2014年3月26日星期三UTC+1下午12时51分24秒,Han JU写道:
>>>>
>>>> Thanks a lot Kevin.
>>>>
>>>> That DJB_HASH result makes it clear for us. I think we'll just use the
>>>> id value as hash.
>>>> Do you guys know how to plugin a custom hash function?
>>>>
>>>>
>>>> 在 2014年3月26日星期三UTC+1上午11时58分36秒,Kevin Wang写道:
>>>>>
>>>>> There are two hash functions
>>>>> implementation
>>>>> org.elasticsearch.cluster.routing.operation.hash.djb.DjbHashFunction
>>>>> and
>>>>> org.elasticsearch.cluster.routing.operation.hash.simple.SimpleHashFunction,
>>>>>
>>>>> default is DjbHashFunction. You can try get the hash by
>>>>> using DjbHashFunction.DJB_HASH(you id)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wednesday, March 26, 2014 9:49:10 PM UTC+11, Han JU wrote:
>>>>>>
>>>>>> Thanks for your reply.
>>>>>>
>>>>>> As far as I know, in Java, basic hash value of positive int/long
>>>>>> value is just themselves (our ids are small values like 1125, 345 etc).
>>>>>> So I calculated some_id % 128, and I got 116 distinct values. But in
>>>>>> reality there's a lot less shards in use.
>>>>>>
>>>>>> Does ElasticSearch use some special hash function?
>>>>>>
>>>>>> 在 2014年3月26日星期三UTC+1上午11时39分15秒,Kevin Wang写道:
>>>>>>>
>>>>>>> ES will get the shard id by hash(routing)%num of shards, in your
>>>>>>> case, there are only 167 distinct values but have 128 shards, I think
>>>>>>> it's
>>>>>>> highly possible there is less than 128 distinct hash values. So some of
>>>>>>> the
>>>>>>> shard will not have any data.
>>>>>>>
>>>>>>>
>>>>>>> Kevin
>>>>>>>
>>>>>>> On Wednesday, March 26, 2014 9:30:36 PM UTC+11, Han JU wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> We've indexed 25M documents into a single index of 128 shards with
>>>>>>>> 1 replica.
>>>>>>>> The `routing` parameter is set to a path in the document, which is
>>>>>>>> an int value:
>>>>>>>>
>>>>>>>> _routing: {
>>>>>>>> path: "some_id"
>>>>>>>> required: true
>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>> In out 25M documents, there's 167 distinct values of this "some_id"
>>>>>>>> and in our expectation, ElasticSearch will route these documents
>>>>>>>> evenly
>>>>>>>> across all shards.
>>>>>>>> But we've found out that, out of 128 shards, there are 53 empty
>>>>>>>> shards (with 0 document inside), or, 40% of the shards are not used at
>>>>>>>> all.
>>>>>>>>
>>>>>>>> My question:
>>>>>>>>
>>>>>>>> - is this normal? Do we miss something in configuring routing?
>>>>>>>> - does this imbalanced shard utilization affect indexing speed?
>>>>>>>>
>>>>>>>> We can confirm that all documents are correctly indexed and routing
>>>>>>>> works (when searching with routing only 1 shard responds with the
>>>>>>>> correct
>>>>>>>> answer).
>>>>>>>> ElasticSearch version is v1.0.1.
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1ade0d24-ee99-4a00-8bf0-7a749271b58d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.