rmuir commented on code in PR #16281:
URL: https://github.com/apache/lucene/pull/16281#discussion_r3452233654


##########
lucene/core/src/java/org/apache/lucene/codecs/CodecUtil.java:
##########
@@ -597,26 +598,62 @@ private static void validateFooter(IndexInput in) throws 
IOException {
     }
   }
 
+  /**
+   * Number of bytes between consecutive merge abort checks during {@link
+   * #checksumEntireFile(IndexInput, MergePolicy.OneMerge)}. Files smaller 
than this are checksummed
+   * in one shot without checking for abort.
+   */
+  private static final long ABORT_CHECK_INTERVAL = 1024 * 1024;
+
   /**
    * Clones the provided input, reads all bytes from the file, and calls 
{@link #checkFooter}
    *
    * <p>Note that this method may be slow, as it must process the entire file. 
If you just need to
    * extract the checksum value, call {@link #retrieveChecksum}.
    */
   public static long checksumEntireFile(IndexInput input) throws IOException {
+    return checksumEntireFile(input, null, Long.MAX_VALUE);
+  }
+
+  /**
+   * Like {@link #checksumEntireFile(IndexInput)}, but periodically checks 
whether the provided
+   * merge has been aborted. This avoids spending a long time checksumming a 
large file when the
+   * merge has already been cancelled.
+   *
+   * @param input the index input to checksum
+   * @param merge the merge to check for abort, or {@code null} to behave like 
{@link
+   *     #checksumEntireFile(IndexInput)}
+   * @throws MergePolicy.MergeAbortedException if the merge is aborted during 
checksumming
+   */
+  public static long checksumEntireFile(IndexInput input, MergePolicy.OneMerge 
merge)
+      throws IOException {
+    return checksumEntireFile(input, merge, ABORT_CHECK_INTERVAL);
+  }
+
+  static long checksumEntireFile(
+      IndexInput input, MergePolicy.OneMerge merge, long abortCheckInterval) 
throws IOException {

Review Comment:
   great. maybe at some point this could be generalized into a built-in 
functional interface/predicate? I do see it as just "passing in an optional 
progress function". 
   
   I feel like the API would be cleaner, but when I looked more, it doesn't 
seem easy.
   
   for now it seems passing OneMerge directly, like you have it, is probably 
the reasonable choice. The MergeAbortedException is not "normal" and gets 
handled in a special way by IndexWriter.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to