ES will get the shard id by hash(routing)%num of shards, in your case,
there are only 167 distinct values but have 128 shards, I think it's highly
possible there is less than 128 distinct hash values. So some of the shard
will not have any data.
Kevin
On Wednesday, March 26, 2014 9:30:36 PM UTC+11, Han JU wrote:
>
> Hi,
>
> We've indexed 25M documents into a single index of 128 shards with 1
> replica.
> The `routing` parameter is set to a path in the document, which is an int
> value:
>
> _routing: {
> path: "some_id"
> required: true
> }
>
>
> In out 25M documents, there's 167 distinct values of this "some_id" and in
> our expectation, ElasticSearch will route these documents evenly across all
> shards.
> But we've found out that, out of 128 shards, there are 53 empty shards
> (with 0 document inside), or, 40% of the shards are not used at all.
>
> My question:
>
> - is this normal? Do we miss something in configuring routing?
> - does this imbalanced shard utilization affect indexing speed?
>
> We can confirm that all documents are correctly indexed and routing works
> (when searching with routing only 1 shard responds with the correct answer).
> ElasticSearch version is v1.0.1.
>
>
> Thanks!
>
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d8961b19-e024-4a04-83fa-48f4cd44b7c4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.