The terms aggregation relies on the fact that field data produces unique
values in order to run efficiently. When you provide a script, by default
there will be a wrapper that will take care of deduplicating them in order
to make sure the result would be the same as if the data was stored in the
index.

You can tell Elasticsearch to assume that values are already unique by
passing `script_values_unique`: `true` to the terms aggregation. Can you
check if it makes the aggregation faster?


On Wed, Apr 9, 2014 at 9:36 PM, Thomas S. <[email protected]> wrote:

> Hi,
>
> I am currently exploring the option of using scripts with aggregations and
> I noticed that for some reason scripts for terms aggregations are executed
> much slower than for other aggregations, even if the script doesn't access
> any fields yet. This also happens for native Java scripts. I'm running
> Elasticsearch 1.1.0.
>
> For example, on my data set the simple script "1" takes around 400ms for
> the sum and histogram aggregations, but takes around 25s to run on a terms
> aggregation, even on repeated runs. What is going on here? Terms
> aggregations without a script are very fast, and histogram/sum aggregations
> with scripts that access the document are also very fast: I had to
> transform a script aggregation that should have been a terms aggregation
> into a histogram and convert the numeric values back into terms on the
> client so the aggregation would be executed in reasonable time.
>
>
> In [2]: app.search.search({'size': 0, 'query': { 'match_all': {} },
> 'aggregations': { 'test_script': { 'terms': { 'script': '1' } } }})
> Out[2]:
> {u'_shards': {u'failed': 0, u'successful': 246, u'total': 246},
>  u'aggregations': {u'test_script': {u'buckets': [{u'doc_count': 4231327,
>      u'key': u'1'}]}},
>  u'hits': {u'hits': [], u'max_score': 0.0, u'total': 4231327},
>  u'timed_out': False,
>  u'took': 24986}
>
>
> In [10]: app.search.search({'size': 0, 'query': { 'match_all': {} },
> 'aggregations': { 'test_script': { 'sum': { 'script': '1' } } }})
> Out[10]:
> {u'_shards': {u'failed': 0, u'successful': 246, u'total': 246},
>  u'aggregations': {u'test_script': {u'value': 4231327.0}},
>  u'hits': {u'hits': [], u'max_score': 0.0, u'total': 4231327},
>  u'timed_out': False,
>  u'took': 363}
>
>
> In [8]: app.search.search({'size': 0, 'query': { 'match_all': {} },
> 'aggregations': { 'test_script': { 'histogram': { 'script': '1',
> 'interval': 1 } } }})
> Out[8]:
> {u'_shards': {u'failed': 0, u'successful': 246, u'total': 246},
>  u'aggregations': {u'test_script': {u'buckets': [{u'doc_count': 4231327,
>      u'key': 1}]}},
>  u'hits': {u'hits': [], u'max_score': 0.0, u'total': 4231327},
>  u'timed_out': False,
>  u'took': 421}
>
>
> Thomas
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/4af8942c-db46-47fa-9d38-370051a15c5c%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/4af8942c-db46-47fa-9d38-370051a15c5c%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j45QsxBkdZePnrnd%2B36--yYZKfk19O_H2OGZUS57%3DGOpg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to