Re: Diagnosing a slow query

ryan chou Thu, 31 Jul 2014 23:57:07 -0700

i would rather suggest you separate the nodes roles. you could divide them 
into 3 node roles.
1.Index node : for index data
2.load balance : for query , you could maximize load balance node memory 
for avoiding the GC consume too much memory.
3.data node : just for store the data. 
and optimize the routing strategy for query.


Those echo node has 96G, what a high allocations. 

在 2014年7月31日星期四UTC+8下午10时51分17秒，Christopher Ambler写道：
>
> Okay, let's attack this directly. We have a cluster of 6 machines (6 
> nodes). We have an index of just under 3.5 million documents. Each document 
> represents an Internet domain name. We are performing queries against this 
> index to see names that exist in our index. Most queries are coming back in 
> the sub-50ms range. But a bunch are taking 600ms to 900ms and, thus, 
> showing up in our slow query log. If they ALL were performing at this 
> speed, I'd wouldn't be nearly as confused, but it looks like only about 10% 
> to 20% of the queries are "slow." That's clearly too much.
>
> Head reports that this index looks like this:
>
> aftermarket-2014-07-31_02-38-19
> size: 424Mi (2.47Gi)
> docs: 3,428,471 (3,428,471)
>
> Here is the configuration for a typical node (they're all pretty-much the 
> same). We have 2 machines in a dev data center, 2 machines in a mesa data 
> center and 2 machines in a phx data center. Each of the two machines in a 
> data center has a "node.zone" tag set, and, as you can see, I have the 
> cluster routing awareness set to see "zone" as its marching orders. The 
> data pipes between the data centers are beefy, and while I acknowledge that 
> cross-DC isn't something that's generally smiled-upon, it appears to work 
> fine.
>
> Each machine has 96G of RAM. We start ES giving it 30G for the heap size. 
> File descriptors are set at 64,000. Note that I've selected the memory 
> mapped file system.
>
> #
> # Server-specific settings for cluster domainiq-es
> #
> cluster.name: domainiq-es
> node.name: "Mesa-03"
> node.zone: es-mesa-prod
> discovery.zen.ping.unicast.hosts: ["dev2.glbt1.gdg", "m1p1.mesa1.gdg", 
> "m1p4.mesa1.gdg", "p3p3.phx3.gdg", "p3p4.phx3.gdg"]
> #
> # The following configuration items should be the same for all ES servers
> #
> node.master: true
> node.data: true
> index.number_of_shards: 5
> index.number_of_replicas: 5
> index.store.type: mmapfs
> index.memory.index_buffer_size: 30%
> index.translog.flush_threshold_ops: 25000
> index.refresh_interval: 30s
> bootstrap.mlockall: true
> cluster.routing.allocation.awareness.attributes: zone
> gateway.recover_after_nodes: 4
> gateway.recover_after_time: 2m
> gateway.expected_nodes: 6
> discovery.zen.minimum_master_nodes: 3
> discovery.zen.ping.timeout: 10s
> discovery.zen.ping.retries: 3
> discovery.zen.ping.interval: 15s
> discovery.zen.ping.multicast.enabled: false
>
> And here is a typical slow query:
>
> [2014-07-31 07:35:31,530][WARN ][index.search.slowlog.query] [Mesa-03] 
> [aftermarket-2014-07-31_02-38-19][2] took[707.6ms], took_millis[707], 
> types[premium], stats[], search_type[QUERY_THEN_FETCH], total_shards[5], 
> source[], 
> extra_source[{"size":35,"query":{"query_string":{"query":"sld:petusies^20.0 
> OR tokens:(((pet^1.2 pets^1.0 *^1.0)AND(us^1.2 *^0.8)AND(ie^1.2 
> *^0.6)AND(s^1.2 *^0.4)) OR((pet^1.2 pets^1.0)AND(us^1.2)AND(ie^1.2))^3.0) 
> AND tld:(com^1.001 OR in^0.99 OR co.in^0.941174367459617 OR 
> net.in^0.8848832474555992 
> OR us^0.85 OR org.in^0.8397882862729736 OR gen.in^0.785829669672289 OR 
> firm.in^0.7414549824163524 OR ind.in^0.7 OR org^0.6) OR 
> _id:petusi.es^5.0-domaintype:partner","lowercase_expanded_terms":true,"analyze_wildcard":false}}}],
>  
>
>
> So note that I create 5 shards and 5 replicas, so that each node has all 5 
> shards at all times. I THOUGHT THIS MEANT BETTER PERFORMANCE. That is, I 
> thought having all 5 shards on every node meant that a query to a node 
> didn't have to ask another node for data. IS THIS NOT TRUE?
>
> Here's where it also gets interesting: I tried setting the number of 
> shards to 2 (with 5 replicas) and my slow queries went to almost 2 seconds 
> (2000ms). This is also terribly counter-intuitive! I thought fewer shards 
> meant less lookup time.
>
> Clearly, I want to optimize for read here. I don't care if indexing is 
> three times as slow, we need our queries to be sub-100ms.
>
> Any help is SERIOUSLY appreciated (and if you're in the Bay Area, I'm not 
> above bribes of beer :-))
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5591aa45-b933-43c9-a13a-168deb53bdf1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Diagnosing a slow query

Reply via email to