Hi Matt,

Thanks for your quick response. However neither worked for us. In our case, 
we set shard_size to 50K (option1 ), it is still missing documents. The 
cluster became unstable if we try to further increase it. We cannot use 
shard_min_doc_count_value, because even it is one hit, its value used for 
bucket ordering can still be large enough to be collected. What we really 
need is "weighted" collect. As a workaround we have to do multiple trips. 
"Weighted collect" may have some performance penalty, but it would be 
better option than multiple trips or setting large shard_size. I am 
wondering if ES plugin can achieve this goal.

Thanks.

On Tuesday, September 16, 2014 4:20:55 PM UTC-4, Matt Weber wrote:
>
> Hi Yifan,
>
> Nothing dynamic, but you can increase the number of terms collected on 
> each shard to increase the accuracy [1].  Might also want to play with the 
> shard_min_doc_count value if you know certain shards have a low hit count 
> and are throwing off the aggregations [2].
>
> [1] 
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_shard_size
> [2] 
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_minimum_document_count
>
> Thanks,
> Matt Weber
>
>
> On Tue, Sep 16, 2014 at 12:36 PM, Yifan Wang <[email protected] 
> <javascript:>> wrote:
>
>> It seems to be a common problem that the top N results returned from an 
>> aggregation query is inaccurate due to uneven distribution of matching 
>> documents on different shards, because ES will collect top N buckets from 
>> each shard no matter actually how many hits are on each shard. It is very 
>> often we collect buckets that should have not been collected on some 
>> shards, but we missed buckets that should have collected on some others. 
>>
>> Is there a way we can collect buckets based on a dynamic "weight", for 
>> example "total hits", on that shard?
>>
>> Thanks in advance.
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/e78571f9-d3e3-4d7c-a60e-d1a2052db397%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/e78571f9-d3e3-4d7c-a60e-d1a2052db397%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ff23136d-eea3-4863-bec1-3caa8edf4777%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to