I would test using multiple primary shards on a single machine. Since your dataset seems to fit into RAM, this could help for these longer latency queries.
On Thursday, July 10, 2014 12:24:26 AM UTC-7, Fin Sekun wrote: > > Any hints? > > > > On Monday, July 7, 2014 3:51:19 PM UTC+2, Fin Sekun wrote: >> >> >> Hi, >> >> >> *SCENARIO* >> >> Our Elasticsearch database has ~2.5 million entries. Each entry has the >> three analyzed fields "match", "sec_match" and "thi_match" (all contains >> 3-20 words) that will be used in this query: >> https://gist.github.com/anonymous/a8d1142512e5625e4e91 >> >> >> ES runs on two *types of servers*: >> (1) Real servers (system has direct access to real CPUs, no >> virtualization) of newest generation - Very performant! >> (2) Cloud servers with virtualized CPUs - Poor CPUs, but this is generic >> for cloud services. >> >> See https://gist.github.com/anonymous/3098b142c2bab51feecc for (1) and >> (2) CPU details. >> >> >> *ES settings:* >> ES version 1.2.0 (jdk1.8.0_05) >> ES_HEAP_SIZE = 512m (we also tested with 1024m with same results) >> vm.max_map_count = 262144 >> ulimit -n 64000 >> ulimit -l unlimited >> index.number_of_shards: 1 >> index.number_of_replicas: 0 >> index.store.type: mmapfs >> threadpool.search.type: fixed >> threadpool.search.size: 75 >> threadpool.search.queue_size: 5000 >> >> >> *Infrastructure*: >> As you can see above, we don't use the cluster feature of ES (1 shard, 0 >> replicas). The reason is that our hosting infrastructure is based on >> different providers. >> Upside: We aren't dependent on a single hosting provider. Downside: Our >> servers aren't in the same LAN. >> >> This means: >> - We cannot use ES sharding, because synchronisation via WAN (internet) >> seems not a useful solution. >> - So, every ES-server has the complete dataset and we configured only one >> shard and no replicas for higher performance. >> - We have a distribution process that updates the ES data on every host >> frequently. This process is fine for us, because updates aren't very often >> and perfect just-in-time ES synchronisation isn't necessary for our >> business case. >> - If a server goes down/crashs, the central loadbalancer removes it (the >> resulting minimal packet lost is acceptable). >> >> >> >> >> *PROBLEM* >> >> For long query terms (6 and more keywords), we have very high CPU loads, >> even on the high performance server (1), and this leads to high response >> times: 1-4sec on server (1), 8-20sec on server (2). The system parameters >> while querying: >> - Very high load (usually 100%) for the thread responsible CPU (the other >> CPUs are idle in our test scenario) >> - No I/O load (the harddisks are fine) >> - No RAM bottlenecks >> >> So, we think the file caching is working fine, because we have no I/O >> problems and the garbage collector seams to be happy (jstat shows very few >> GCs). The CPU is the problem, and ES hot-threads point to the Scorer module: >> https://gist.github.com/anonymous/9cecfd512cb533114b7d >> >> >> >> >> *SUMMARY/ASSUMPTIONS* >> >> - Our database size isn't very big and the query not very complex. >> - ES is designed for huge amount of data, but the key is >> clustering/sharding: Data distribution to many servers means smaller >> indices, smaller indices leads to fewer CPU load and short response times. >> - So, our database isn't big, but to big for a single CPU and this means >> especially low performance (virtual) CPUs can only be used in sharding >> environments. >> >> If we don't want to lost the provider independency, we have only the >> following two options: >> >> 1) Simpler query (I think not possible in our case) >> 2) Smaller database >> >> >> >> >> *QUESTIONS* >> >> Are our assumptions correct? Especially: >> >> - Is clustering/sharding (also small indices) the main key to >> performance, that means the only possibility to prevent overloaded >> (virtual) CPUs? >> - Is it right that clustering is only useful/possible in LANs? >> - Do you have any ES configuration or architecture hints regarding our >> preference for using multiple hosting providers? >> >> >> >> Thank you. Rgds >> Fin >> >> -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/046b78ca-9173-4fa0-ae5d-309a716c9dc3%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
