If you introduce an extra reduction phase (for multiple shards on the same 
node) you introduce further potential for inaccuracies in the final results.
Consider the role of 'size' and 'shard_size' in the "terms" aggregation [1] 
and the effects they have on accuracy. You'd arguably need a 'node_size' 
setting to also control the size of this new intermediate collection. All 
stages that reduce the volumes of data processed can introduce an 
approximation with the potential for inaccuracies upstream when merging.


[1] 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_shard_size

On Wednesday, January 14, 2015 at 5:44:47 PM UTC, Elliott Bradshaw wrote:
>
> Adrien,
>
> I get the feeling that you're a pretty heavy contributor to the 
> aggregation module.  In your experience, would a shard per cpu core 
> strategy be an effective performance solution in a pure aggregation use 
> case?    If this could proportionally reduce the aggregation time, would a 
> node local reduce (in which all shard aggregations on a given node are 
> reduced prior to being sent to the client node) be a good follow on 
> strategy for further enhancement?
>
> On Wednesday, January 14, 2015 at 10:56:03 AM UTC-5, Adrien Grand wrote:
>>
>>
>>
>> On Wed, Jan 14, 2015 at 4:16 PM, Elliott Bradshaw <[email protected]> 
>> wrote:
>>
>>> Just out of curiosity, are aggregations on multiple shards on a single 
>>> node executed serially or in parallel?  In my experience, it appears that 
>>> they're executed serially (my CPU usage did not change when going from 1 
>>> shard to 2 shards per node, but I didn't test this extensively).  I'm 
>>> interested in maximizing the parallelism of an aggregation without creating 
>>> a massive number of nodes.
>>>
>>>
>> Requests are processed serially per shard, but several shards can be 
>> processed at the same time. So if you have an index that consists of N 
>> primaries, this would run on N processors of your cluster in parallel.
>>
>>
>> -- 
>> Adrien Grand
>>  
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/fa822b2f-97f9-423a-8e35-7963c53c34f9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to