Re: Need help, multiple aggregations with filters extremely slow, where to look for optimizations?

Thomas Fri, 13 Jun 2014 00:41:58 -0700

Below is an example aggregation i perform, is there any optimizations I can 
perform? Maybe disabling some features i do not need etc.


curl -XPOST 
"http://localhost:9200/logs-idx.20140613/event/_search?search_type=count"; -d
'
{
  "aggs": {
    "f1": {
      "filter": {
        "or": [
          {
            "and": [
              {
                "has_parent": {
                  "type": "request",
                  "filter": {
                    "and": {
                      "filters": [
                        {
                          "term": {
                            "country": "US"
                          }
                        },
                        {
                          "term": {
                            "city": "NY"
                          }
                        },
                        {
                          "term": {
                            "code": 12
                          }
                        }
                      ]
                    }
                  }
                }
              },
              {
                "range": {
                  "event_time": {
                    "gte": "2014-06-13T10:00:00",
                    "lt": "2014-06-13T11:00:00"
                  }
                }
              }
            ]
          },
          {
            "and": [
              {
                "has_parent": {
                  "type": "request",
                  "filter": {
                    "and": {
                      "filters": [
                        {
                          "term": {
                            "country": "US"
                          }
                        },
                        {
                          "term": {
                            "city": "NY"
                          }
                        },
                        {
                          "term": {
                            "code": 12
                          }
                        },
                        {
                          "range": {
                            "request_time": {
                              "gte": "2014-06-13T10:00:00",
                              "lt": "2014-06-13T11:00:00"
                            }
                          }
                        }
                      ]
                    }
                  }
                }
              },
              {
                "range": {
                  "event_time": {
                    "lt": "2014-06-13T10:00:00"
                  }
                }
              }
            ]
          }
        ]
      },
      "aggs": {
        "per_interval": {
          "date_histogram": {
            "field": "event_time",
            "interval": "minute"
          },
          "aggs": {
            "metrics": {
              "terms": {
                "field": "event",
                "size": 10
              }
            }
          }
        }
      }
    }
  }
}'


On Friday, 13 June 2014 10:09:46 UTC+3, Thomas wrote:
>
> Hi,
>
> I'm facing a performance issue with some aggregations I perform, and I 
> need your help if possible:
>
> I have to documents, the *request* and the *event*. The request is the 
> parent of the event. Below is a (sample) mapping
>
> "event" : {
> "dynamic" : "strict",
> "_parent" : {
>            "type" : "request"
>         },
> "properties" : {
>    "event_time" : {
> "format" : "dateOptionalTime",
> "type" : "date"
>            },
>    "count" : {
>       "type" : "integer"
>     },
>     "event" : {
>         "index" : "not_analyzed",
>         "type" : "string"
>     }
>          }
> }
>
> "request" : {
>     "dynamic" : "strict",
>      "_id" : {
>        "path" : "uniqueId"
>      },
>      "properties" : {
>         "uniqueId" : {
>              "index" : "not_analyzed",
>              "type" : "string"
>         },
>         "user" : {
>              "index" : "not_analyzed",
>              "type" : "string"
>         },
>        "code" : {
>           "type" : "integer"
>        },
>        "country" : {
>          "index" : "not_analyzed",
>          "type" : "string"
>        },
>        "city" : {
>          "index" : "not_analyzed",
>          "type" : "string"
>        }
>       ....
>     }
> }
>
> My cluster is becoming really big (almost 2 TB of data with billions of 
> documents) and i maintain one index per day, whereas I occasionally delete 
> old indices. My daily index is about 20GB big. The version of elasticsearch 
> that I use is 1.1.1. 
>
> My problems start when I want to get some aggregations of events with some 
> criteria which is applied in the parent request document. For example count 
> be the events of type *click for country = US and code=12. What I was 
> initially doing was to generate a scriptFilter for the request document (in 
> Groovy) and I was adding multiple aggregations in one search request. This 
> ended up being very slow so I removed the scripting logic and I supported 
> my logic with java code.*
>
> What seems to be initially solved in my local machine, when I got back to 
> the cluster, nothing has changed. Again my app performs really really poor. 
> I get more than 10 seconds to perform a search with ~10 sub-aggregations.
>
> What seems strange is that I notice that the cluster is pretty ok with 
> regards load average, CPU etc. 
>
> Any hints on where to look for solving this out? to be able to identify 
> the bottleneck
>
> *Ask for any additional information to provide*, I didn't want to make 
> this post too long to read
> Thank you
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a4cf00b0-9786-4327-80f9-34941eaf3ca8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Need help, multiple aggregations with filters extremely slow, where to look for optimizations?

Reply via email to