Did you monitor the memory usage?

If memory is going low, try to avoid caching (unless you always query for
"416" and "29") and check if you can avoid "query_string"  in preference to
"term", and boolean clauses with single term, since they can be simplified.

Also note you only need one "constant_score" per query because there is no
such thing like multiple scores in a single query.

A simplification would look like

GET index/_search
{
  "query": {
    "constant_score": {
      "filter": {
         "bool": {
              "must": [
                { "term": { "objfield4" : 416 } },
                { "term": { "objfield5" : 29 } }
            ]
        }
      }
    }
  }
}

Jörg

On Tue, Oct 28, 2014 at 9:44 AM, Cosmin-Radu Vasii <
[email protected]> wrote:

> Hi,
>
> I have the following environment:
> 10 ES data nodes, each with 8 cores, 30 Gb of RAM and really good
> hardrive, -Xms18000m -Xmx18000m, default thread pools(in this case 24
> threads for search operations)
> 2 ES dedicated master nodes: 8 cores, 30 Gb of RAM and really good
> hardrive(hardrive not relevant for this nodes though),  -Xms18000m
> -Xmx18000m, default thread pools(in this case 24 threads for search
> operations)
> 4 Tomcat 7 instances, with a webapp which has a node client which connects
> to the ES cluster for sending queries: 14 Gb of RAM, 4 cores, 250 threads
> for Tomcat, -Xms7000m -Xmx7000m
> 1 Haproxy which acts as a balancer in front of the 4 Tomcat instances.
>
> There have indexed* ~1 billion documents*, distributed in 10 shards and 0
> replicas at an insane rate, from* 70000 to 100000 docs/s*. I increased
> the *replicas to 2* afterwards(in 1h I had 1 replica added). The
> documents are quite small:
>
> {
>                "field1": "13446", //5 digits
>                "date1": "24/10/2013 03:22 AM", //date
>                "field2": "3502", //4 digits
>                "field3": "5310", //4 digits
>                "date2": "02/04/2012 01:21 AM", //date
>                "field4": "4f3dce61-1d6c-418f-877b-5419a043bd42", //UUID
>                "field5": "2890",//4 digits
>                "obj": {
>                   "objfield1": "761532940881576", //15 digits
>                   "objfield2": "231806579463504",//15 digits
>                   "objfield3": "879",//3 digits
>                   "objfield4": "416",//3 digits
>                   "objfield5": "14"//2 digits
>                }
> }
>
> All the fields are dates(2 of them) and string in the mapping, even though
> they are numbers in real life.
>
> I ran queries using 800 different threads from 4 different jmeter
> machines, each machine with 200 threads(this machines are also really
> powerful).
>
> The queries built by the webapps using the JAVA API look like this(I use
> filters and try to take advantage of the cache). The queries are different
> combinations between maximum 3 of the fields and range for the 2 dates.
>
> GET index/_search
> {
>   "query": {
>     "constant_score": {
>       "filter": {
>         "fquery": {
>           "query": {
>             "bool": {
>               "must": [
>                 {
>                   "constant_score": {
>                     "filter": {
>                       "fquery": {
>                         "query": {
>                           "query_string": {
>                             "query": "obj.objfield4:416"
>                           }
>                         },
>                         "_cache": true
>                       }
>                     }
>                   }
>                 },
>
>                 {
>                   "constant_score": {
>                     "filter": {
>                       "fquery": {
>                         "query": {
>                           "query_string": {
>                             "query": "obj.objfield5:29"
>                           }
>                         },
>                         "_cache": true
>                       }
>                     }
>                   }
>                 }
>               ]
>             }
>           },
>           "_cache": true
>         }
>       }
>     }
>   }
> }
>
> The results are outrageous, between *20 seconds and even 100 seconds*,
> and I have 30 shards even distributed between the nodes.
>
> What am I doing wrong here, because I would expect results below 3 seconds.
>
> Should I have the fields as numbers and not as strings? Should I remove
> the query string and use a term there?
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/7a333c7d-0f51-4943-89a5-6328ff0ba41f%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/7a333c7d-0f51-4943-89a5-6328ff0ba41f%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH%2Bi8VEZRaQqYizj-DBZOKc0UYkq_f49ZpRJTfEgi1JYw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to