Re: Huge response time for simple queries in an uber environment

[email protected] Tue, 28 Oct 2014 03:05:51 -0700

Did you monitor the memory usage?

If memory is going low, try to avoid caching (unless you always query for
"416" and "29") and check if you can avoid "query_string"  in preference to
"term", and boolean clauses with single term, since they can be simplified.


Also note you only need one "constant_score" per query because there is no
such thing like multiple scores in a single query.

A simplification would look like

GET index/_search
{
  "query": {
    "constant_score": {
      "filter": {
         "bool": {
              "must": [
                { "term": { "objfield4" : 416 } },
                { "term": { "objfield5" : 29 } }
            ]
        }
      }
    }
  }
}

Jörg

On Tue, Oct 28, 2014 at 9:44 AM, Cosmin-Radu Vasii <
[email protected]> wrote:

> Hi,
>
> I have the following environment:
> 10 ES data nodes, each with 8 cores, 30 Gb of RAM and really good
> hardrive, -Xms18000m -Xmx18000m, default thread pools(in this case 24
> threads for search operations)
> 2 ES dedicated master nodes: 8 cores, 30 Gb of RAM and really good
> hardrive(hardrive not relevant for this nodes though),  -Xms18000m
> -Xmx18000m, default thread pools(in this case 24 threads for search
> operations)
> 4 Tomcat 7 instances, with a webapp which has a node client which connects
> to the ES cluster for sending queries: 14 Gb of RAM, 4 cores, 250 threads
> for Tomcat, -Xms7000m -Xmx7000m
> 1 Haproxy which acts as a balancer in front of the 4 Tomcat instances.
>
> There have indexed* ~1 billion documents*, distributed in 10 shards and 0
> replicas at an insane rate, from* 70000 to 100000 docs/s*. I increased
> the *replicas to 2* afterwards(in 1h I had 1 replica added). The
> documents are quite small:
>
> {
>                "field1": "13446", //5 digits
>                "date1": "24/10/2013 03:22 AM", //date
>                "field2": "3502", //4 digits
>                "field3": "5310", //4 digits
>                "date2": "02/04/2012 01:21 AM", //date
>                "field4": "4f3dce61-1d6c-418f-877b-5419a043bd42", //UUID
>                "field5": "2890",//4 digits
>                "obj": {
>                   "objfield1": "761532940881576", //15 digits
>                   "objfield2": "231806579463504",//15 digits
>                   "objfield3": "879",//3 digits
>                   "objfield4": "416",//3 digits
>                   "objfield5": "14"//2 digits
>                }
> }
>
> All the fields are dates(2 of them) and string in the mapping, even though
> they are numbers in real life.
>
> I ran queries using 800 different threads from 4 different jmeter
> machines, each machine with 200 threads(this machines are also really
> powerful).
>
> The queries built by the webapps using the JAVA API look like this(I use
> filters and try to take advantage of the cache). The queries are different
> combinations between maximum 3 of the fields and range for the 2 dates.
>
> GET index/_search
> {
>   "query": {
>     "constant_score": {
>       "filter": {
>         "fquery": {
>           "query": {
>             "bool": {
>               "must": [
>                 {
>                   "constant_score": {
>                     "filter": {
>                       "fquery": {
>                         "query": {
>                           "query_string": {
>                             "query": "obj.objfield4:416"
>                           }
>                         },
>                         "_cache": true
>                       }
>                     }
>                   }
>                 },
>
>                 {
>                   "constant_score": {
>                     "filter": {
>                       "fquery": {
>                         "query": {
>                           "query_string": {
>                             "query": "obj.objfield5:29"
>                           }
>                         },
>                         "_cache": true
>                       }
>                     }
>                   }
>                 }
>               ]
>             }
>           },
>           "_cache": true
>         }
>       }
>     }
>   }
> }
>
> The results are outrageous, between *20 seconds and even 100 seconds*,
> and I have 30 shards even distributed between the nodes.
>
> What am I doing wrong here, because I would expect results below 3 seconds.
>
> Should I have the fields as numbers and not as strings? Should I remove
> the query string and use a term there?
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/7a333c7d-0f51-4943-89a5-6328ff0ba41f%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/7a333c7d-0f51-4943-89a5-6328ff0ba41f%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH%2Bi8VEZRaQqYizj-DBZOKc0UYkq_f49ZpRJTfEgi1JYw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Huge response time for simple queries in an uber environment

Reply via email to