jpountz opened a new issue, #14959: URL: https://github.com/apache/lucene/issues/14959
By default, Lucene currently uses compound files for flushed segments, and merged segments that use less than 10% of the total index size (computed either as a number of docs, or as a byte size depending on the merge policy). I am considering switching to a fixed threshold, e.g. using compound files for all segments below 64MB for byte-size-based merge policies (`TieredMergePolicy`, `LogByteSizeMergePolicy`) or 65,536 docs for doc-based merge policies (`LogDocMergePolicy`). I like it better for a few reasons: - Whether a segment is compound or not is more deterministic (and thus easier to reason about) as it doesn't depend on the total size of the index at the time of merging. - The current ratio doesn't work well in multi-tenant scenarios where you could still have plenty of small files overall due to many small indexes. Ideally we would have a single switch on `IndexWriterConfig` instead of having flushes and merges independently make decisions about whether a segments qualifies for being compound. I'm also wondering if we need to keep the current approach that is based on a ratio, or if only supporting a fixed threshold would be good enough. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org