Thanks for your reply.

As far as I know, in Java, basic hash value of positive int/long value is 
just themselves (our ids are small values like 1125, 345 etc).
So I calculated some_id % 128, and I got 116 distinct values. But in 
reality there's a lot less shards in use. 

Does ElasticSearch use some special hash function?

在 2014年3月26日星期三UTC+1上午11时39分15秒,Kevin Wang写道:
>
> ES will get the shard id by hash(routing)%num of shards, in your case, 
> there are only 167 distinct values but have 128 shards, I think it's highly 
> possible there is less than 128 distinct hash values. So some of the shard 
> will not have any data.
>
>
> Kevin
>
> On Wednesday, March 26, 2014 9:30:36 PM UTC+11, Han JU wrote:
>>
>> Hi,
>>
>> We've indexed 25M documents into a single index of 128 shards with 1 
>> replica. 
>> The `routing` parameter is set to a path in the document, which is an int 
>> value:
>>
>> _routing: {
>>   path: "some_id"
>>   required: true
>> }
>>
>>
>> In out 25M documents, there's 167 distinct values of this "some_id" and 
>> in our expectation, ElasticSearch will route these documents evenly across 
>> all shards.
>> But we've found out that, out of 128 shards, there are 53 empty shards 
>> (with 0 document inside), or, 40% of the shards are not used at all.
>>
>> My question: 
>>
>> - is this normal? Do we miss something in configuring routing? 
>> - does this imbalanced shard utilization affect indexing speed?
>>
>> We can confirm that all documents are correctly indexed and routing works 
>> (when searching with routing only 1 shard responds with the correct answer).
>> ElasticSearch version is v1.0.1.
>>
>>  
>> Thanks!
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f54da2a0-0b7a-49fb-b852-b2200c862b4d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to