The terms aggregation relies on the fact that field data produces unique values in order to run efficiently. When you provide a script, by default there will be a wrapper that will take care of deduplicating them in order to make sure the result would be the same as if the data was stored in the index.
You can tell Elasticsearch to assume that values are already unique by passing `script_values_unique`: `true` to the terms aggregation. Can you check if it makes the aggregation faster? On Wed, Apr 9, 2014 at 9:36 PM, Thomas S. <[email protected]> wrote: > Hi, > > I am currently exploring the option of using scripts with aggregations and > I noticed that for some reason scripts for terms aggregations are executed > much slower than for other aggregations, even if the script doesn't access > any fields yet. This also happens for native Java scripts. I'm running > Elasticsearch 1.1.0. > > For example, on my data set the simple script "1" takes around 400ms for > the sum and histogram aggregations, but takes around 25s to run on a terms > aggregation, even on repeated runs. What is going on here? Terms > aggregations without a script are very fast, and histogram/sum aggregations > with scripts that access the document are also very fast: I had to > transform a script aggregation that should have been a terms aggregation > into a histogram and convert the numeric values back into terms on the > client so the aggregation would be executed in reasonable time. > > > In [2]: app.search.search({'size': 0, 'query': { 'match_all': {} }, > 'aggregations': { 'test_script': { 'terms': { 'script': '1' } } }}) > Out[2]: > {u'_shards': {u'failed': 0, u'successful': 246, u'total': 246}, > u'aggregations': {u'test_script': {u'buckets': [{u'doc_count': 4231327, > u'key': u'1'}]}}, > u'hits': {u'hits': [], u'max_score': 0.0, u'total': 4231327}, > u'timed_out': False, > u'took': 24986} > > > In [10]: app.search.search({'size': 0, 'query': { 'match_all': {} }, > 'aggregations': { 'test_script': { 'sum': { 'script': '1' } } }}) > Out[10]: > {u'_shards': {u'failed': 0, u'successful': 246, u'total': 246}, > u'aggregations': {u'test_script': {u'value': 4231327.0}}, > u'hits': {u'hits': [], u'max_score': 0.0, u'total': 4231327}, > u'timed_out': False, > u'took': 363} > > > In [8]: app.search.search({'size': 0, 'query': { 'match_all': {} }, > 'aggregations': { 'test_script': { 'histogram': { 'script': '1', > 'interval': 1 } } }}) > Out[8]: > {u'_shards': {u'failed': 0, u'successful': 246, u'total': 246}, > u'aggregations': {u'test_script': {u'buckets': [{u'doc_count': 4231327, > u'key': 1}]}}, > u'hits': {u'hits': [], u'max_score': 0.0, u'total': 4231327}, > u'timed_out': False, > u'took': 421} > > > Thomas > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/4af8942c-db46-47fa-9d38-370051a15c5c%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/4af8942c-db46-47fa-9d38-370051a15c5c%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- Adrien Grand -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j45QsxBkdZePnrnd%2B36--yYZKfk19O_H2OGZUS57%3DGOpg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
