I am trying to understand why I am seeing very small segment sizes during 
indexing. I am using elasticsearch and one node sees heavy merge activity. 
After enabling info stream logs it seems that the node is doing more, smaller 
merges than the other nodes. In the TMP logs, I see a lot of merges of segments 
much smaller than the floor size, some only a few KB. After some research, it 
seems that lucene writes segments per IndexWriter, so small segments could come 
about if using a lot of writers, but not writing much data. This could 
definitely happen in my setup as some indices don’t take many writes but the 
writer is flushed every 30 seconds to make those writes available for search.

The puzzling thing to me is that there seems to be some governor somewhere. The 
machine CPU is at about 50% user and I/O usage is low. I would have expected 
resource utilization to get maxed out. If I can understand what’s limiting 
things, perhaps I can raise that limit. Otherwise I can’t think of anything 
else to try except flushing less frequently. 

I do see this message much more often (4x) on the problematic node:

 DW: DocumentsWriter has queued dwpt; will hijack this thread to flush pending 
segment(s)

Is that something to worry about?
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to