Re: [PR] Allow merge integrity checks to be aborted sooner [lucene]

via GitHub Tue, 19 May 2026 10:18:23 -0700


mikemccand commented on code in PR #16086:
URL: https://github.com/apache/lucene/pull/16086#discussion_r3268156197



##########
lucene/core/src/java/org/apache/lucene/index/IndexWriterConfig.java:
##########
@@ -561,4 +561,21 @@ public IndexWriterConfig setParentField(String 
parentField) {
     this.parentField = parentField;
     return this;
   }
+
+  /**
+   * Expert: sets the interval in bytes between abort checks during merge 
integrity verification.
+   * During merges, codec readers verify file integrity by reading entire 
files to compute
+   * checksums. This setting controls how often the merge abort flag is 
checked during these reads.
+   * Smaller values allow merges to be aborted more promptly but add slight 
overhead.
+   *
+   * @param intervalBytes the interval in bytes, must be positive
+   */
+  public IndexWriterConfig setMergeAbortCheckIntervalBytes(int intervalBytes) {

Review Comment:
   Let's make this `long`?  In general if something is measuring bytes let's 
try to use `long` -- we do math on such things that may lead to overflow?



##########
lucene/core/src/java/org/apache/lucene/codecs/lucene90/compressing/FieldsIndex.java:
##########
@@ -38,6 +39,9 @@ final long getStartPointer(int docID) {
   /** Check the integrity of the index. */
   abstract void checkIntegrity() throws IOException;
 
+  /** Check the integrity of the index, with periodic merge abort checking. */
+  public abstract void checkIntegrity(MergePolicy.AbortChecker abortChecker) 
throws IOException;

Review Comment:
   Since we are only adding "check every N bytes", can we rename `AbortChecker` 
to something more specific?  Is this a public API, or will Lucene users only 
interact with the setter/getter on IWC (taking just int or long)?



##########
lucene/core/src/java/org/apache/lucene/codecs/CodecUtil.java:
##########
@@ -675,4 +698,38 @@ public static int readBEInt(DataInput in) throws 
IOException {
   public static long readBELong(DataInput in) throws IOException {
     return (((long) readBEInt(in)) << 32) | (readBEInt(in) & 0xFFFFFFFFL);
   }
+
+  /**
+   * A {@link BufferedChecksumIndexInput} that periodically checks if the 
current merge should be
+   * aborted.
+   *
+   * <p>The check is performed in {@link #readBytes} since that is what {@link
+   * ChecksumIndexInput#seek} calls in a loop to compute the checksum.
+   */
+  private static final class AbortableBufferedChecksumIndexInput

Review Comment:
   Hmm, you're adding another level of buffering / copying, when someone turns 
on this new setting?  I think?  Can we subclass `FilterIndexInput` instead, so 
it's simply/only the added accounting (tracking total bytes written) and not 
more buffering / copying?



##########
lucene/core/src/java/org/apache/lucene/index/LiveIndexWriterConfig.java:
##########
@@ -117,6 +117,9 @@ public class LiveIndexWriterConfig {
   /** The IndexWriter event listener to record key events * */
   protected IndexWriterEventListener eventListener;
 
+  /** Interval in bytes between abort checks during merge integrity 
verification. */
+  protected volatile int mergeAbortCheckIntervalBytes;

Review Comment:
   Is it only `checkIntegrity` that we are newly instrumenting here?



##########
lucene/core/src/java/org/apache/lucene/index/LiveIndexWriterConfig.java:
##########
@@ -461,6 +464,14 @@ public IndexWriterEventListener 
getIndexWriterEventListener() {
     return eventListener;
   }
 
+  /**
+   * Returns the interval in bytes between abort checks during merge integrity 
verification. See
+   * {@link IndexWriterConfig#setMergeAbortCheckIntervalBytes(int)}.
+   */
+  public int getMergeAbortCheckIntervalBytes() {

Review Comment:
   `long` instead?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Allow merge integrity checks to be aborted sooner [lucene]

Reply via email to