jpountz opened a new issue, #14959:
URL: https://github.com/apache/lucene/issues/14959

   By default, Lucene currently uses compound files for flushed segments, and 
merged segments that use less than 10% of the total index size (computed either 
as a number of docs, or as a byte size depending on the merge policy).
   
   I am considering switching to a fixed threshold, e.g. using compound files 
for all segments below 64MB for byte-size-based merge policies 
(`TieredMergePolicy`, `LogByteSizeMergePolicy`) or 65,536 docs for doc-based 
merge policies (`LogDocMergePolicy`).
   
   I like it better for a few reasons:
    - Whether a segment is compound or not is more deterministic (and thus 
easier to reason about) as it doesn't depend on the total size of the index at 
the time of merging.
    - The current ratio doesn't work well in multi-tenant scenarios where you 
could still have plenty of small files overall due to many small indexes.
   
   Ideally we would have a single switch on `IndexWriterConfig` instead of 
having flushes and merges independently make decisions about whether a segments 
qualifies for being compound.
   
   I'm also wondering if we need to keep the current approach that is based on 
a ratio, or if only supporting a fixed threshold would be good enough.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to