Hey Jörg, thanks for the detailed reply.
We don't really run facets and our field data cache size is very small. Increasing the node transport and ping timeouts is definitely something we'll consider. Reducing the number of shards per node is also something to consider, but am reluctant to add more nodes at the moment (already spending lots of cash). I think a deep dive into GC tuning is possibly called for, and we've done some of that already. Java 8 with G1GC is an interesting suggestion too! Thanks again, Nic On Monday, 17 February 2014 20:50:30 UTC, Jörg Prante wrote: > > Maybe it's the field cache that moves to old gen, when using facets. > > I am tackling this challenge by a combination of several strategies > > - tuning index.indices.fielddata.cache.size > > - working around the issue by increasing node transport and ping timeout > from 5s to something high like 30s (so GCs are allowed to run 20s without > node disconnects) > > - reducing number of shards per node (this just means to reduce the number > of docs / index size / filter cache per node somehow), simplest method is > adding nodes > > - using heap sizes as small as possible - in my use case 6G are sufficient > > - not sure if you want to go the path on the bleeding edge, but using Java > 8 and G1GC with XX:MaxGCPauseMillis of ~100-1000ms helps me. CPU load is a > bit higher with G1GC, but since I have 32 cores on a node, it does not > matter that much. > > - otherwise, there are lots of CMS GC tuning options (needs deep GC > analysis) > > Jörg > > > > On Mon, Feb 17, 2014 at 4:34 PM, Nic Long > <[email protected]<javascript:> > > wrote: > >> Hey all, >> >> we regularly (several times a week) get longish GCs (20s or more) due to >> promotion failures. >> >> From what I understand this type of major GC is caused by fragmentation >> of the heap. >> >> So I'm wondering: >> >> 1. What is all the stuff ES puts into the heap that ends up in the Old >> Gen? >> 2. Are there any recommended strategies for dealing with this specific >> kind of problem. >> >> For example, would allowing more filter caching help or cause even more >> problems? And so on. >> >> To give a little more info on our usage, we're read heavy, nearly >> entirely filter operations. Our heap is at ~10g. nearly all of which is >> used by the Old Gen (until a major GC runs). >> >> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/4b3ef926-94b7-4de0-b076-d5fdbc44021c%40googlegroups.com >> . >> For more options, visit https://groups.google.com/groups/opt_out. >> > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d8803723-0cec-40ba-a095-4fe73f123e75%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
