Thanks a lot Kevin.
That DJB_HASH result makes it clear for us. I think we'll just use the id
value as hash.
Do you guys know how to plugin a custom hash function?
在 2014年3月26日星期三UTC+1上午11时58分36秒,Kevin Wang写道:
>
> There are two hash functions
> implementation
> org.elasticsearch.cluster.routing.operation.hash.djb.DjbHashFunction
> and
> org.elasticsearch.cluster.routing.operation.hash.simple.SimpleHashFunction,
> default is DjbHashFunction. You can try get the hash by
> using DjbHashFunction.DJB_HASH(you id)
>
>
>
>
> On Wednesday, March 26, 2014 9:49:10 PM UTC+11, Han JU wrote:
>>
>> Thanks for your reply.
>>
>> As far as I know, in Java, basic hash value of positive int/long value is
>> just themselves (our ids are small values like 1125, 345 etc).
>> So I calculated some_id % 128, and I got 116 distinct values. But in
>> reality there's a lot less shards in use.
>>
>> Does ElasticSearch use some special hash function?
>>
>> 在 2014年3月26日星期三UTC+1上午11时39分15秒,Kevin Wang写道:
>>>
>>> ES will get the shard id by hash(routing)%num of shards, in your case,
>>> there are only 167 distinct values but have 128 shards, I think it's highly
>>> possible there is less than 128 distinct hash values. So some of the shard
>>> will not have any data.
>>>
>>>
>>> Kevin
>>>
>>> On Wednesday, March 26, 2014 9:30:36 PM UTC+11, Han JU wrote:
>>>>
>>>> Hi,
>>>>
>>>> We've indexed 25M documents into a single index of 128 shards with 1
>>>> replica.
>>>> The `routing` parameter is set to a path in the document, which is an
>>>> int value:
>>>>
>>>> _routing: {
>>>> path: "some_id"
>>>> required: true
>>>> }
>>>>
>>>>
>>>> In out 25M documents, there's 167 distinct values of this "some_id" and
>>>> in our expectation, ElasticSearch will route these documents evenly across
>>>> all shards.
>>>> But we've found out that, out of 128 shards, there are 53 empty shards
>>>> (with 0 document inside), or, 40% of the shards are not used at all.
>>>>
>>>> My question:
>>>>
>>>> - is this normal? Do we miss something in configuring routing?
>>>> - does this imbalanced shard utilization affect indexing speed?
>>>>
>>>> We can confirm that all documents are correctly indexed and routing
>>>> works (when searching with routing only 1 shard responds with the correct
>>>> answer).
>>>> ElasticSearch version is v1.0.1.
>>>>
>>>>
>>>> Thanks!
>>>>
>>>
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/29da8ba5-ccab-40b0-9174-b6522408dd51%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.