Below is an example aggregation i perform, is there any optimizations I can perform? Maybe disabling some features i do not need etc.
curl -XPOST "http://localhost:9200/logs-idx.20140613/event/_search?search_type=count" -d ' { "aggs": { "f1": { "filter": { "or": [ { "and": [ { "has_parent": { "type": "request", "filter": { "and": { "filters": [ { "term": { "country": "US" } }, { "term": { "city": "NY" } }, { "term": { "code": 12 } } ] } } } }, { "range": { "event_time": { "gte": "2014-06-13T10:00:00", "lt": "2014-06-13T11:00:00" } } } ] }, { "and": [ { "has_parent": { "type": "request", "filter": { "and": { "filters": [ { "term": { "country": "US" } }, { "term": { "city": "NY" } }, { "term": { "code": 12 } }, { "range": { "request_time": { "gte": "2014-06-13T10:00:00", "lt": "2014-06-13T11:00:00" } } } ] } } } }, { "range": { "event_time": { "lt": "2014-06-13T10:00:00" } } } ] } ] }, "aggs": { "per_interval": { "date_histogram": { "field": "event_time", "interval": "minute" }, "aggs": { "metrics": { "terms": { "field": "event", "size": 10 } } } } } } } }' On Friday, 13 June 2014 10:09:46 UTC+3, Thomas wrote: > > Hi, > > I'm facing a performance issue with some aggregations I perform, and I > need your help if possible: > > I have to documents, the *request* and the *event*. The request is the > parent of the event. Below is a (sample) mapping > > "event" : { > "dynamic" : "strict", > "_parent" : { > "type" : "request" > }, > "properties" : { > "event_time" : { > "format" : "dateOptionalTime", > "type" : "date" > }, > "count" : { > "type" : "integer" > }, > "event" : { > "index" : "not_analyzed", > "type" : "string" > } > } > } > > "request" : { > "dynamic" : "strict", > "_id" : { > "path" : "uniqueId" > }, > "properties" : { > "uniqueId" : { > "index" : "not_analyzed", > "type" : "string" > }, > "user" : { > "index" : "not_analyzed", > "type" : "string" > }, > "code" : { > "type" : "integer" > }, > "country" : { > "index" : "not_analyzed", > "type" : "string" > }, > "city" : { > "index" : "not_analyzed", > "type" : "string" > } > .... > } > } > > My cluster is becoming really big (almost 2 TB of data with billions of > documents) and i maintain one index per day, whereas I occasionally delete > old indices. My daily index is about 20GB big. The version of elasticsearch > that I use is 1.1.1. > > My problems start when I want to get some aggregations of events with some > criteria which is applied in the parent request document. For example count > be the events of type *click for country = US and code=12. What I was > initially doing was to generate a scriptFilter for the request document (in > Groovy) and I was adding multiple aggregations in one search request. This > ended up being very slow so I removed the scripting logic and I supported > my logic with java code.* > > What seems to be initially solved in my local machine, when I got back to > the cluster, nothing has changed. Again my app performs really really poor. > I get more than 10 seconds to perform a search with ~10 sub-aggregations. > > What seems strange is that I notice that the cluster is pretty ok with > regards load average, CPU etc. > > Any hints on where to look for solving this out? to be able to identify > the bottleneck > > *Ask for any additional information to provide*, I didn't want to make > this post too long to read > Thank you > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a4cf00b0-9786-4327-80f9-34941eaf3ca8%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
