I am trying to understand why I am seeing very small segment sizes during indexing. I am using elasticsearch and one node sees heavy merge activity. After enabling info stream logs it seems that the node is doing more, smaller merges than the other nodes. In the TMP logs, I see a lot of merges of segments much smaller than the floor size, some only a few KB. After some research, it seems that lucene writes segments per IndexWriter, so small segments could come about if using a lot of writers, but not writing much data. This could definitely happen in my setup as some indices don’t take many writes but the writer is flushed every 30 seconds to make those writes available for search.
The puzzling thing to me is that there seems to be some governor somewhere. The machine CPU is at about 50% user and I/O usage is low. I would have expected resource utilization to get maxed out. If I can understand what’s limiting things, perhaps I can raise that limit. Otherwise I can’t think of anything else to try except flushing less frequently. I do see this message much more often (4x) on the problematic node: DW: DocumentsWriter has queued dwpt; will hijack this thread to flush pending segment(s) Is that something to worry about? --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org