Re: Huge response time for simple queries in an uber environment

Cosmin-Radu Vasii Tue, 28 Oct 2014 03:20:07 -0700

Marvel tells me that the JVM heap usage is around 75% for all the data 
nodes(like 12 Gb) and the CPU is constantly between 90-95%. You are saying 
that my queries are pretty different and I am not really using the cache 
and ES will try to cache documents, but it will get rid of those really 
quickly? Something like the documents are being stored in the cache, but 
for nothing?


marți, 28 octombrie 2014, 12:04:59 UTC+2, Jörg Prante a scris:
>
> Did you monitor the memory usage?
>
> If memory is going low, try to avoid caching (unless you always query for 
> "416" and "29") and check if you can avoid "query_string"  in preference to 
> "term", and boolean clauses with single term, since they can be simplified.
>
> Also note you only need one "constant_score" per query because there is no 
> such thing like multiple scores in a single query.
>
> A simplification would look like
>
> GET index/_search
> {
>   "query": {
>     "constant_score": {
>       "filter": {
>          "bool": {
>               "must": [
>                 { "term": { "objfield4" : 416 } },
>                 { "term": { "objfield5" : 29 } }
>             ]
>         }
>       }
>     }
>   }
> }
>
> Jörg
>
> On Tue, Oct 28, 2014 at 9:44 AM, Cosmin-Radu Vasii <[email protected] 
> <javascript:>> wrote:
>
>> Hi,
>>
>> I have the following environment:
>> 10 ES data nodes, each with 8 cores, 30 Gb of RAM and really good 
>> hardrive, -Xms18000m -Xmx18000m, default thread pools(in this case 24 
>> threads for search operations)
>> 2 ES dedicated master nodes: 8 cores, 30 Gb of RAM and really good 
>> hardrive(hardrive not relevant for this nodes though),  -Xms18000m 
>> -Xmx18000m, default thread pools(in this case 24 threads for search 
>> operations)
>> 4 Tomcat 7 instances, with a webapp which has a node client which 
>> connects to the ES cluster for sending queries: 14 Gb of RAM, 4 cores, 250 
>> threads for Tomcat, -Xms7000m -Xmx7000m
>> 1 Haproxy which acts as a balancer in front of the 4 Tomcat instances.
>>
>> There have indexed* ~1 billion documents*, distributed in 10 shards and 
>> 0 replicas at an insane rate, from* 70000 to 100000 docs/s*. I increased 
>> the *replicas to 2* afterwards(in 1h I had 1 replica added). The 
>> documents are quite small:
>>
>> {
>>                "field1": "13446", //5 digits
>>                "date1": "24/10/2013 03:22 AM", //date
>>                "field2": "3502", //4 digits
>>                "field3": "5310", //4 digits
>>                "date2": "02/04/2012 01:21 AM", //date
>>                "field4": "4f3dce61-1d6c-418f-877b-5419a043bd42", //UUID
>>                "field5": "2890",//4 digits
>>                "obj": {
>>                   "objfield1": "761532940881576", //15 digits
>>                   "objfield2": "231806579463504",//15 digits
>>                   "objfield3": "879",//3 digits
>>                   "objfield4": "416",//3 digits
>>                   "objfield5": "14"//2 digits
>>                }
>> }
>>
>> All the fields are dates(2 of them) and string in the mapping, even 
>> though they are numbers in real life.
>>
>> I ran queries using 800 different threads from 4 different jmeter 
>> machines, each machine with 200 threads(this machines are also really 
>> powerful).
>>
>> The queries built by the webapps using the JAVA API look like this(I use 
>> filters and try to take advantage of the cache). The queries are different 
>> combinations between maximum 3 of the fields and range for the 2 dates.
>>
>> GET index/_search
>> {
>>   "query": {
>>     "constant_score": {
>>       "filter": {
>>         "fquery": {
>>           "query": {
>>             "bool": {
>>               "must": [
>>                 {
>>                   "constant_score": {
>>                     "filter": {
>>                       "fquery": {
>>                         "query": {
>>                           "query_string": {
>>                             "query": "obj.objfield4:416"
>>                           }
>>                         },
>>                         "_cache": true
>>                       }
>>                     }
>>                   }
>>                 },
>>                 
>>                 {
>>                   "constant_score": {
>>                     "filter": {
>>                       "fquery": {
>>                         "query": {
>>                           "query_string": {
>>                             "query": "obj.objfield5:29"
>>                           }
>>                         },
>>                         "_cache": true
>>                       }
>>                     }
>>                   }
>>                 }
>>               ]
>>             }
>>           },
>>           "_cache": true
>>         }
>>       }
>>     }
>>   }
>> }
>>
>> The results are outrageous, between *20 seconds and even 100 seconds*, 
>> and I have 30 shards even distributed between the nodes.
>>
>> What am I doing wrong here, because I would expect results below 3 
>> seconds.
>>
>> Should I have the fields as numbers and not as strings? Should I remove 
>> the query string and use a term there?
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/7a333c7d-0f51-4943-89a5-6328ff0ba41f%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/7a333c7d-0f51-4943-89a5-6328ff0ba41f%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/eb369f30-5c18-4a2a-84f1-32faf004bd62%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Huge response time for simple queries in an uber environment

Reply via email to