Very likely this problem is not related to nested documents but to fielddata loading because of the "integer" field. Field data is a column-oriented view of the data that is, by default, lazily loaded from the inverted index on the first time that it is needed, and then cached until the end of life of the segment it belongs to. So only the first request that needs it is supposed to be slow.
It is possible to load field data eagerly[1] in order to make sure that field data loading is never going to impact response times. This way you should not get such slow response times on the first queries. Another option would be to use doc values[2] that will store field data on disk instead of loading it from the inverted index. Since data will already be stored in a column-oriented way, there will be no need to uninvert data from the inverted index (which is costly and probably the reason of your slow queries). [1] http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/fielddata-formats.html#_fielddata_loading [2] http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/fielddata-formats.html#_numeric_field_data_types On Thu, Feb 13, 2014 at 7:34 PM, Luke Scott <[email protected]>wrote: > I have an index that uses 1 level of nested documents. When I run a query > on it the result comes back in about 20-200 milliseconds. When I add a > facet or an aggregation involving the nested documents the uncached > response always takes 2-3 seconds, regardless of how many documents have > been selected, even zero. > > My map looks like this: > > { > "document": { > "dynamic": "strict", > "properties": { > "account_id": { > "type": "long" > }, > "data": { > "type": "nested", > "properties": { > "key": { > "type": "string", > "index": "not_analyzed" > }, > "string": { > "type": "string", > "index": "not_analyzed", > "fields": { > "token": { > "type": "string" > } > } > }, > "integer": { > "type": "long" > }, > "date": { > "type": "date", > "format": "dateOptionalTime" > } > } > } > } > } > } > > There are 3.6 million documents in this index. My query looks like this: > > { > "query": { > "bool":{ > "must":[ > {"term":{"account_id": 1}}, > { > "nested":{ > "path":"data", > "query":{"term":{"key":"amount"}} > } > } > ] > } > } > } > > The result to the above query is 0 documents because account_id 1 doesn't > have any documents with a key of "amount". Uncached this returns in about > 10-150ms: > > { > "took": 9, > "timed_out": false, > "_shards": { > "total": 5, > "successful": 5, > "failed": 0 > }, > "hits": { > "total": 0, > "max_score": null, > "hits": [] > } > } > > When I add an aggregation to the query: > > { > ... > "aggs" : { > "report" : { > "nested" : { > "path" : "data" > }, > "aggs" : { > "amount" : { > "filter" : { > "query": {"term": {"key":"amount"}} > }, > "aggs": { > "sum": { > "sum" : { "field" : "integer" } > } > } > } > } > } > } > } > > Uncached the query returns in about 2-3 seconds: > > { > "took": 2770, > "timed_out": false, > "_shards": { > "total": 5, > "successful": 5, > "failed": 0 > }, > "hits": { > "total": 0, > "max_score": null, > "hits": [] > }, > "aggregations": { > "report": { > "doc_count": 0, > "amount": { > "doc_count": 0, > "sum": { > "value": 0 > } > } > } > } > } > > If I run the same thing a second time (cached) it runs in 26 milliseconds. > If I clear the cache and run it again it takes 2 seconds. > > Why is this aggregation always taking 2-3 seconds, even though the query > is returning 0 documents? The same thing happens with a statistical facet. > > - > Luke > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/a82323a6-9a81-436b-a2d2-cc26e918cb7c%40googlegroups.com > . > For more options, visit https://groups.google.com/groups/opt_out. > -- Adrien Grand -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j5RsKkENHLXtCN-BEizgi6jwci_Ed8SU%3Dsu5i8hGVHa0w%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
