Can you post the IndexWriter infoStream output? I can see if anything stands out.
Maybe it was just that this node was doing big merges? I.e., if you waited long enough, the other shards would eventually do their big merges too? Have you changed any default settings, do custom routing, etc.? Is there any reason to think that the docs that land on this node are "different" in any way? Mike McCandless http://blog.mikemccandless.com On Sun, Jul 6, 2014 at 6:48 PM, Kireet Reddy <[email protected]> wrote: > From all the information I’ve collected, it seems to be the merging > activity: > > > 1. We capture the cluster stats into graphite and the current merges > stat seems to be about 10x higher on this node. > 2. The actual node that the problem occurs on has happened on > different physical machines so a h/w issue seems unlikely. Once the problem > starts it doesn't seem to stop. We have blown away the indices in the past > and started indexing again after enabling more logging/stats. > 3. I've stopped executing queries so the only thing that's happening > on the cluster is indexing. > 4. Last night when the problem was ongoing, I disabled refresh > (index.refresh_interval = -1) around 2:10am. Within 1 hour, the load > returned to normal. The merge activity seemed to reduce, it seems like 2 > very long running merges are executing but not much else. > 5. I grepped an hour of logs of the 2 machiese for "add merge=", it > was 540 on the high load node and 420 on a normal node. I pulled out the > size value from the log message and the merges seemed to be much smaller on > the high load node. > > I just created the indices a few days ago, so the shards of each index are > balanced across the nodes. We have external metrics around document ingest > rate and there was no spike during this time period. > > > > Thanks > Kireet > > > On Sunday, July 6, 2014 1:32:00 PM UTC-7, Michael McCandless wrote: > >> It's perfectly normal/healthy for many small merges below the floor size >> to happen. >> >> I think you should first figure out why this node is different from the >> others? Are you sure it's merging CPU cost that's different? >> >> Mike McCandless >> >> http://blog.mikemccandless.com >> >> >> On Sat, Jul 5, 2014 at 9:51 PM, Kireet Reddy <[email protected]> wrote: >> >>> We have a situation where one of the four nodes in our cluster seems >>> to get caught up endlessly merging. However it seems to be high CPU >>> activity and not I/O constrainted. I have enabled the IndexWriter info >>> stream logs, and often times it seems to do merges of quite small segments >>> (100KB) that are much below the floor size (2MB). I suspect this is due to >>> frequent refreshes and/or using lots of threads concurrently to do >>> indexing. Is this true? >>> >>> My supposition is that this is leading to the merge policy doing lots of >>> merges of very small segments into another small segment which will again >>> require a merge to even reach the floor size. My index has 64 segments and >>> 25 are below the floor size. I am wondering if there should be an exception >>> for the maxMergesAtOnce parameter for the first level so that many small >>> segments could be merged at once in this case. >>> >>> I am considering changing the other parameters (wider tiers, lower floor >>> size, more concurrent merges allowed) but these all seem to have side >>> effects I may not necessarily want. Is there a good solution here? >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "elasticsearch" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> >>> To view this discussion on the web visit https://groups.google.com/d/ >>> msgid/elasticsearch/0a8db0dc-ae0b-49cb-b29d-e396510bf755% >>> 40googlegroups.com >>> <https://groups.google.com/d/msgid/elasticsearch/0a8db0dc-ae0b-49cb-b29d-e396510bf755%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/edc22069-8674-41db-ab06-226b05d293aa%40googlegroups.com > <https://groups.google.com/d/msgid/elasticsearch/edc22069-8674-41db-ab06-226b05d293aa%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD7smReE1N4kKQhfKsF0UJaDqOezkWxb9tCkAhjwQY9Mxks4%3DQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
