Re: [PR] Switching to a fixed CFS threshold [lucene]

via GitHub Mon, 10 Nov 2025 08:54:01 -0800


stefanvodita commented on code in PR #15295:
URL: https://github.com/apache/lucene/pull/15295#discussion_r2511173640



##########
lucene/core/src/java/org/apache/lucene/index/LiveIndexWriterConfig.java:
##########
@@ -368,9 +368,7 @@ public InfoStream getInfoStream() {
    *
    * <p>Use <code>false</code> for batch indexing with very large ram buffer 
settings.
    *
-   * <p><b>Note: To control compound file usage during segment merges see 
{@link
-   * MergePolicy#setNoCFSRatio(double)} and {@link 
MergePolicy#setMaxCFSSegmentSizeMB(double)}. This
-   * setting only applies to newly created segments.</b>
+   * <p><b>Note: To control compound file usage during segment merges.</b>

Review Comment:
   Do we want to say more in this comment?



##########
lucene/core/src/test/org/apache/lucene/index/TestAddIndexes.java:
##########
@@ -1450,11 +1455,11 @@ public void testNonCFSLeftovers() throws Exception {
 
     MockDirectoryWrapper dir = new MockDirectoryWrapper(random(), new 
ByteBuffersDirectory());
     IndexWriterConfig conf =
-        new IndexWriterConfig(new 
MockAnalyzer(random())).setMergePolicy(newLogMergePolicy(true));
-    MergePolicy lmp = conf.getMergePolicy();
+        new IndexWriterConfig(new 
MockAnalyzer(random())).setMergePolicy(newLogMergePolicy());
     // Force creation of CFS:
-    lmp.setNoCFSRatio(1.0);
-    lmp.setMaxCFSSegmentSizeMB(Double.POSITIVE_INFINITY);
+    conf.getCodec().compoundFormat().setShouldUseCompoundFile(true);
+    
conf.getCodec().compoundFormat().setMaxCFSSegmentSizeMB(Double.POSITIVE_INFINITY);
+    conf.setUseCompoundFile(true);

Review Comment:
   Wouldn't we still benefit from the `true` default?



##########
lucene/test-framework/src/java/org/apache/lucene/tests/util/LuceneTestCase.java:
##########
@@ -3149,7 +3132,6 @@ protected static IndexWriterConfig 
ensureSaneIWCOnNightly(IndexWriterConfig conf
       // and might use many per-field codecs. turn on CFS for IW flushes
       // and ensure CFS ratio is reasonable to keep it contained.
       conf.setUseCompoundFile(true);
-      mp.setNoCFSRatio(Math.max(0.25d, mp.getNoCFSRatio()));

Review Comment:
   Do we have an equivalent setting?



##########
lucene/core/src/java/org/apache/lucene/index/IndexWriter.java:
##########
@@ -5336,7 +5342,13 @@ public int length() {
       // this segment:
       boolean useCompoundFile;
       synchronized (this) { // Guard segmentInfos
-        useCompoundFile = mergePolicy.useCompoundFile(segmentInfos, 
merge.info, this);
+        useCompoundFile =
+            merge
+                .getMergeInfo()
+                .info
+                .getCodec()
+                .compoundFormat()
+                .useCompoundFile(mergePolicy.size(merge.info, this), 
mergePolicy);

Review Comment:
   Nit: Sometimes we call `getMergeInfo`, sometimes we address `info` directly.



##########
lucene/core/src/test/org/apache/lucene/index/TestIndexWriterDelete.java:
##########
@@ -862,9 +862,6 @@ public void eval(MockDirectoryWrapper dir) throws 
IOException {
                 .setReaderPooling(false)
                 .setMergePolicy(newLogMergePolicy()));
 
-    MergePolicy lmp = modifier.getConfig().getMergePolicy();
-    lmp.setNoCFSRatio(1.0);

Review Comment:
   We don't need to replicate this?



##########
lucene/core/src/java/org/apache/lucene/codecs/CompoundFormat.java:
##########
@@ -34,6 +36,152 @@ protected CompoundFormat() {}
   // TODO: this is very minimal. If we need more methods,
   // we can add 'producer' classes.
 
+  /** Default document count threshold for using compound files with 
LogDocMergePolicy */
+  static final int DEFAULT_CFS_THRESHOLD_DOC_SIZE = 65536; // docs

Review Comment:
   Why do we believe these are good thresholds? Do we need a follow-up item 
about tweaking them?



##########
lucene/core/src/test/org/apache/lucene/document/TestFeatureField.java:
##########
@@ -62,9 +62,13 @@ public void testBasics() throws Exception {
     Directory dir = newDirectory();
     RandomIndexWriter writer =
         new RandomIndexWriter(
-            random(),
-            dir,
-            
newIndexWriterConfig().setMergePolicy(newLogMergePolicy(random().nextBoolean())));
+            random(), dir, 
newIndexWriterConfig().setMergePolicy(newLogMergePolicy()));
+    writer

Review Comment:
   A lot of times we chain these access methods to eventually set compound 
files. I don't think it's a blocker, but maybe in the future we might add some 
convenience methods. No strong opinion on it though, I might actually prefer 
the way it's done now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Switching to a fixed CFS threshold [lucene]

Reply via email to