stefanvodita commented on code in PR #15295:
URL: https://github.com/apache/lucene/pull/15295#discussion_r2511173640
##########
lucene/core/src/java/org/apache/lucene/index/LiveIndexWriterConfig.java:
##########
@@ -368,9 +368,7 @@ public InfoStream getInfoStream() {
*
* <p>Use <code>false</code> for batch indexing with very large ram buffer
settings.
*
- * <p><b>Note: To control compound file usage during segment merges see
{@link
- * MergePolicy#setNoCFSRatio(double)} and {@link
MergePolicy#setMaxCFSSegmentSizeMB(double)}. This
- * setting only applies to newly created segments.</b>
+ * <p><b>Note: To control compound file usage during segment merges.</b>
Review Comment:
Do we want to say more in this comment?
##########
lucene/core/src/test/org/apache/lucene/index/TestAddIndexes.java:
##########
@@ -1450,11 +1455,11 @@ public void testNonCFSLeftovers() throws Exception {
MockDirectoryWrapper dir = new MockDirectoryWrapper(random(), new
ByteBuffersDirectory());
IndexWriterConfig conf =
- new IndexWriterConfig(new
MockAnalyzer(random())).setMergePolicy(newLogMergePolicy(true));
- MergePolicy lmp = conf.getMergePolicy();
+ new IndexWriterConfig(new
MockAnalyzer(random())).setMergePolicy(newLogMergePolicy());
// Force creation of CFS:
- lmp.setNoCFSRatio(1.0);
- lmp.setMaxCFSSegmentSizeMB(Double.POSITIVE_INFINITY);
+ conf.getCodec().compoundFormat().setShouldUseCompoundFile(true);
+
conf.getCodec().compoundFormat().setMaxCFSSegmentSizeMB(Double.POSITIVE_INFINITY);
+ conf.setUseCompoundFile(true);
Review Comment:
Wouldn't we still benefit from the `true` default?
##########
lucene/test-framework/src/java/org/apache/lucene/tests/util/LuceneTestCase.java:
##########
@@ -3149,7 +3132,6 @@ protected static IndexWriterConfig
ensureSaneIWCOnNightly(IndexWriterConfig conf
// and might use many per-field codecs. turn on CFS for IW flushes
// and ensure CFS ratio is reasonable to keep it contained.
conf.setUseCompoundFile(true);
- mp.setNoCFSRatio(Math.max(0.25d, mp.getNoCFSRatio()));
Review Comment:
Do we have an equivalent setting?
##########
lucene/core/src/java/org/apache/lucene/index/IndexWriter.java:
##########
@@ -5336,7 +5342,13 @@ public int length() {
// this segment:
boolean useCompoundFile;
synchronized (this) { // Guard segmentInfos
- useCompoundFile = mergePolicy.useCompoundFile(segmentInfos,
merge.info, this);
+ useCompoundFile =
+ merge
+ .getMergeInfo()
+ .info
+ .getCodec()
+ .compoundFormat()
+ .useCompoundFile(mergePolicy.size(merge.info, this),
mergePolicy);
Review Comment:
Nit: Sometimes we call `getMergeInfo`, sometimes we address `info` directly.
##########
lucene/core/src/test/org/apache/lucene/index/TestIndexWriterDelete.java:
##########
@@ -862,9 +862,6 @@ public void eval(MockDirectoryWrapper dir) throws
IOException {
.setReaderPooling(false)
.setMergePolicy(newLogMergePolicy()));
- MergePolicy lmp = modifier.getConfig().getMergePolicy();
- lmp.setNoCFSRatio(1.0);
Review Comment:
We don't need to replicate this?
##########
lucene/core/src/java/org/apache/lucene/codecs/CompoundFormat.java:
##########
@@ -34,6 +36,152 @@ protected CompoundFormat() {}
// TODO: this is very minimal. If we need more methods,
// we can add 'producer' classes.
+ /** Default document count threshold for using compound files with
LogDocMergePolicy */
+ static final int DEFAULT_CFS_THRESHOLD_DOC_SIZE = 65536; // docs
Review Comment:
Why do we believe these are good thresholds? Do we need a follow-up item
about tweaking them?
##########
lucene/core/src/test/org/apache/lucene/document/TestFeatureField.java:
##########
@@ -62,9 +62,13 @@ public void testBasics() throws Exception {
Directory dir = newDirectory();
RandomIndexWriter writer =
new RandomIndexWriter(
- random(),
- dir,
-
newIndexWriterConfig().setMergePolicy(newLogMergePolicy(random().nextBoolean())));
+ random(), dir,
newIndexWriterConfig().setMergePolicy(newLogMergePolicy()));
+ writer
Review Comment:
A lot of times we chain these access methods to eventually set compound
files. I don't think it's a blocker, but maybe in the future we might add some
convenience methods. No strong opinion on it though, I might actually prefer
the way it's done now.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]