I'm doing some benchmarking on aggregations on a dataset of about 
50.000.000 documents. It's quite a complex aggregation, nesting 5 levels 
deep. At the top level is a terms aggregation, and on the nested levels 
there is a combination of terms, stats and percentiles aggregations on 
deeper levels. With number of shards set to 15 and 20GB of heap on a 3 
cluster setup in EC2 (m3.2xlarge with IOPS provisioned EBS), the 
aggregation takes about 90 seconds, with quite some memory and CPU (about 
50%) consumption. Doing this experiment leads me to some questions:

- Will setting routing according to the terms in the top-level aggregation 
have any impact? I don't know how the aggregations work, but the assumption 
is that this will make sure all data for the sub aggregations are on the 
same node.
- Does the aggregation operate on the entire document, or does it only load 
the fields that are aggregated?
- Moving from a local dev environment with one node to a three node cluster 
didn't yield as much improvment as I expected. Am I hitting any "limits" 
with regards to aggregations of this complexity?
- Are there any other optimizations that could affect the performance?


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/cc5eb156-d2f7-4a39-af77-83b9c0a42c7a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to