Hi Matt, Thanks for your quick response. However neither worked for us. In our case, we set shard_size to 50K (option1 ), it is still missing documents. The cluster became unstable if we try to further increase it. We cannot use shard_min_doc_count_value, because even it is one hit, its value used for bucket ordering can still be large enough to be collected. What we really need is "weighted" collect. As a workaround we have to do multiple trips. "Weighted collect" may have some performance penalty, but it would be better option than multiple trips or setting large shard_size. I am wondering if ES plugin can achieve this goal.
Thanks. On Tuesday, September 16, 2014 4:20:55 PM UTC-4, Matt Weber wrote: > > Hi Yifan, > > Nothing dynamic, but you can increase the number of terms collected on > each shard to increase the accuracy [1]. Might also want to play with the > shard_min_doc_count value if you know certain shards have a low hit count > and are throwing off the aggregations [2]. > > [1] > http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_shard_size > [2] > http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_minimum_document_count > > Thanks, > Matt Weber > > > On Tue, Sep 16, 2014 at 12:36 PM, Yifan Wang <[email protected] > <javascript:>> wrote: > >> It seems to be a common problem that the top N results returned from an >> aggregation query is inaccurate due to uneven distribution of matching >> documents on different shards, because ES will collect top N buckets from >> each shard no matter actually how many hits are on each shard. It is very >> often we collect buckets that should have not been collected on some >> shards, but we missed buckets that should have collected on some others. >> >> Is there a way we can collect buckets based on a dynamic "weight", for >> example "total hits", on that shard? >> >> Thanks in advance. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/e78571f9-d3e3-4d7c-a60e-d1a2052db397%40googlegroups.com >> >> <https://groups.google.com/d/msgid/elasticsearch/e78571f9-d3e3-4d7c-a60e-d1a2052db397%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ff23136d-eea3-4863-bec1-3caa8edf4777%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
