ES will get the shard id by hash(routing)%num of shards, in your case, 
there are only 167 distinct values but have 128 shards, I think it's highly 
possible there is less than 128 distinct hash values. So some of the shard 
will not have any data.


Kevin

On Wednesday, March 26, 2014 9:30:36 PM UTC+11, Han JU wrote:
>
> Hi,
>
> We've indexed 25M documents into a single index of 128 shards with 1 
> replica. 
> The `routing` parameter is set to a path in the document, which is an int 
> value:
>
> _routing: {
>   path: "some_id"
>   required: true
> }
>
>
> In out 25M documents, there's 167 distinct values of this "some_id" and in 
> our expectation, ElasticSearch will route these documents evenly across all 
> shards.
> But we've found out that, out of 128 shards, there are 53 empty shards 
> (with 0 document inside), or, 40% of the shards are not used at all.
>
> My question: 
>
> - is this normal? Do we miss something in configuring routing? 
> - does this imbalanced shard utilization affect indexing speed?
>
> We can confirm that all documents are correctly indexed and routing works 
> (when searching with routing only 1 shard responds with the correct answer).
> ElasticSearch version is v1.0.1.
>
>  
> Thanks!
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d8961b19-e024-4a04-83fa-48f4cd44b7c4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to