Re: [PR] Adaptive indexing mode for prefetching [lucene]

via GitHub Fri, 09 Jan 2026 09:15:24 -0800


epotyom commented on code in PR #15556:
URL: https://github.com/apache/lucene/pull/15556#discussion_r2676677289



##########
lucene/core/src/java/org/apache/lucene/search/Query.java:
##########
@@ -69,6 +69,21 @@ public Weight createWeight(IndexSearcher searcher, ScoreMode 
scoreMode, float bo
     throw new UnsupportedOperationException("Query " + this + " does not 
implement createWeight");
   }
 
+  /**
+   * Expert: Constructs an appropriate Weight implementation for this query 
with IndexMode.
+   *
+   * <p>Only implemented by primitive queries, which re-write to themselves.
+   *
+   * @param scoreMode How the produced scorers will be consumed.
+   * @param boost The boost that is propagated by the parent queries.
+   * @param indexingMode The index mode for prefetching optimization.
+   */
+  public Weight createWeight(

Review Comment:
   Can we use IndexSearcher's index mode getter in the Weights implementations 
that need it (I think it is TermWeight only right now) to avoid changing 
createWeight signature?



##########
lucene/core/src/java/org/apache/lucene/index/TermStates.java:
##########
@@ -193,7 +193,25 @@ public IOSupplier<TermState> get(LeafReaderContext ctx) 
throws IOException {
         this.states[ctx.ord] = EMPTY_TERMSTATE;
         return null;
       }
-      return () -> {
+      if (termExistsSupplier.doDefer()) {
+        return () -> {
+          if (this.states[ctx.ord] == null) {
+            TermState state = null;
+            if (termExistsSupplier.get()) {
+              state = termsEnum.termState();
+              this.states[ctx.ord] = state;
+            } else {
+              this.states[ctx.ord] = EMPTY_TERMSTATE;
+            }
+          }
+          TermState state = this.states[ctx.ord];
+          if (state == EMPTY_TERMSTATE) {
+            return null;
+          }
+          return state;
+        };
+      } else {
+        // TODO: dedup? also, do we need this second == null check for both 
defer/not defer?

Review Comment:
   +1



##########
lucene/core/src/java/org/apache/lucene/store/MemorySegmentIndexInput.java:
##########
@@ -321,22 +320,56 @@ public void prefetch(long offset, long length) throws 
IOException {
       // power of two. There is a good chance that a good chunk of this index 
input is cached in
       // physical memory. Let's skip the overhead of the madvise system call, 
we'll be trying again
       // on the next power of two of the counter.
-      return;
+      return false;
     }
 
     final NativeAccess nativeAccess = NATIVE_ACCESS.get();
-    advise(
+    return advise(
         offset,
         length,
         segment -> {
-          if (segment.isLoaded() == false) {
+          if (isProbablyLoaded(segment)) {
+            return false;
+          } else {
             // We have a cache miss on at least one page, let's reset the 
counter.
             consecutivePrefetchHitCount = 0;
             nativeAccess.madviseWillNeed(segment);
+            return true;
           }
         });
   }
 
+  public boolean isProbablyLoaded(MemorySegment segment) {

Review Comment:
   Does it have to be public?



##########
lucene/core/src/java/org/apache/lucene/codecs/lucene103/blocktree/FieldReader.java:
##########
@@ -21,6 +21,7 @@
 import org.apache.lucene.index.IndexOptions;
 import org.apache.lucene.index.Terms;
 import org.apache.lucene.index.TermsEnum;
+import org.apache.lucene.search.IndexingMode;

Review Comment:
   Let't rename to IndexMode?



##########
lucene/core/src/java/org/apache/lucene/codecs/lucene103/blocktree/SegmentTermsEnum.java:
##########
@@ -491,33 +479,75 @@ private IOBooleanSupplier prepareSeekExact(BytesRef 
target, boolean prefetch) th
       return null;
     }
 
-    if (prefetch) {
-      currentFrame.prefetchBlock();
-    }
+    return getIoBooleanSupplier(target, prefetch);
+  }
 
-    return () -> {
-      currentFrame.loadBlock();
+  private IOBooleanSupplier getIoBooleanSupplier(BytesRef target, boolean 
prefetch)
+      throws IOException {
+    boolean doDefer = maybePrefetch(prefetch);
 
-      final SeekStatus result = currentFrame.scanToTerm(target, true);
-      if (result == SeekStatus.FOUND) {
-        // if (DEBUG) {
-        //   System.out.println("  return FOUND term=" + term.utf8ToString() + 
" " + term);
-        // }
-        return true;
-      } else {
-        // if (DEBUG) {
-        //   System.out.println("  got result " + result + "; return NOT_FOUND 
term=" +
-        // term.utf8ToString());
-        // }
+    return new IOBooleanSupplier() {
+      @Override
+      public boolean get() throws IOException {
+        currentFrame.loadBlock();
 
-        return false;
+        final SeekStatus result = currentFrame.scanToTerm(target, true);
+        if (result == SeekStatus.FOUND) {
+          // if (DEBUG) {
+          //   System.out.println("  return FOUND term=" + term.utf8ToString() 
+ " " + term);
+          // }
+          return true;
+        } else {
+          // if (DEBUG) {
+          //   System.out.println("  got result " + result + "; return 
NOT_FOUND term=" +
+          // term.utf8ToString());
+          // }
+
+          return false;
+        }
+      }
+
+      @Override
+      public boolean doDefer() {
+        return doDefer;
       }
     };
   }
 
+  private boolean maybePrefetch(boolean prefetch) throws IOException {
+    if (indexingMode == IndexingMode.HOT) {
+      return prefetch;
+    }
+    if (indexingMode == IndexingMode.COLD || !prefetch) {
+      return false;
+    }
+
+    boolean doDefer = currentFrame.prefetchBlock();
+    if (doDefer) {
+      hotCounter = 0;
+    } else {
+      hotCounter++;
+    }
+    return doDefer;
+  }
+
   @Override
   public IOBooleanSupplier prepareSeekExact(BytesRef target) throws 
IOException {
-    return prepareSeekExact(target, true);
+    return prepareSeekExact(target, likelyCold());
+  }
+
+  private short hotCounter = 0;
+
+  private boolean likelyCold() {
+    switch (indexingMode) {
+      case COLD:
+        return true;
+      case HOT:
+        return false;
+      case ADAPTIVE:
+      default:
+        return hotCounter < 1000;

Review Comment:
   I like the exponential implementation of similar functionality in 
https://github.com/apache/lucene/blob/25e381bb7a1ab3a70c0965800b09b9b2e60675c0/lucene/core/src/java/org/apache/lucene/store/MemorySegmentIndexInput.java#L319
 , can we use it instead of hardcoded threshold?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Adaptive indexing mode for prefetching [lucene]

Reply via email to