epotyom commented on code in PR #15556:
URL: https://github.com/apache/lucene/pull/15556#discussion_r2676677289
##########
lucene/core/src/java/org/apache/lucene/search/Query.java:
##########
@@ -69,6 +69,21 @@ public Weight createWeight(IndexSearcher searcher, ScoreMode
scoreMode, float bo
throw new UnsupportedOperationException("Query " + this + " does not
implement createWeight");
}
+ /**
+ * Expert: Constructs an appropriate Weight implementation for this query
with IndexMode.
+ *
+ * <p>Only implemented by primitive queries, which re-write to themselves.
+ *
+ * @param scoreMode How the produced scorers will be consumed.
+ * @param boost The boost that is propagated by the parent queries.
+ * @param indexingMode The index mode for prefetching optimization.
+ */
+ public Weight createWeight(
Review Comment:
Can we use IndexSearcher's index mode getter in the Weights implementations
that need it (I think it is TermWeight only right now) to avoid changing
createWeight signature?
##########
lucene/core/src/java/org/apache/lucene/index/TermStates.java:
##########
@@ -193,7 +193,25 @@ public IOSupplier<TermState> get(LeafReaderContext ctx)
throws IOException {
this.states[ctx.ord] = EMPTY_TERMSTATE;
return null;
}
- return () -> {
+ if (termExistsSupplier.doDefer()) {
+ return () -> {
+ if (this.states[ctx.ord] == null) {
+ TermState state = null;
+ if (termExistsSupplier.get()) {
+ state = termsEnum.termState();
+ this.states[ctx.ord] = state;
+ } else {
+ this.states[ctx.ord] = EMPTY_TERMSTATE;
+ }
+ }
+ TermState state = this.states[ctx.ord];
+ if (state == EMPTY_TERMSTATE) {
+ return null;
+ }
+ return state;
+ };
+ } else {
+ // TODO: dedup? also, do we need this second == null check for both
defer/not defer?
Review Comment:
+1
##########
lucene/core/src/java/org/apache/lucene/store/MemorySegmentIndexInput.java:
##########
@@ -321,22 +320,56 @@ public void prefetch(long offset, long length) throws
IOException {
// power of two. There is a good chance that a good chunk of this index
input is cached in
// physical memory. Let's skip the overhead of the madvise system call,
we'll be trying again
// on the next power of two of the counter.
- return;
+ return false;
}
final NativeAccess nativeAccess = NATIVE_ACCESS.get();
- advise(
+ return advise(
offset,
length,
segment -> {
- if (segment.isLoaded() == false) {
+ if (isProbablyLoaded(segment)) {
+ return false;
+ } else {
// We have a cache miss on at least one page, let's reset the
counter.
consecutivePrefetchHitCount = 0;
nativeAccess.madviseWillNeed(segment);
+ return true;
}
});
}
+ public boolean isProbablyLoaded(MemorySegment segment) {
Review Comment:
Does it have to be public?
##########
lucene/core/src/java/org/apache/lucene/codecs/lucene103/blocktree/FieldReader.java:
##########
@@ -21,6 +21,7 @@
import org.apache.lucene.index.IndexOptions;
import org.apache.lucene.index.Terms;
import org.apache.lucene.index.TermsEnum;
+import org.apache.lucene.search.IndexingMode;
Review Comment:
Let't rename to IndexMode?
##########
lucene/core/src/java/org/apache/lucene/codecs/lucene103/blocktree/SegmentTermsEnum.java:
##########
@@ -491,33 +479,75 @@ private IOBooleanSupplier prepareSeekExact(BytesRef
target, boolean prefetch) th
return null;
}
- if (prefetch) {
- currentFrame.prefetchBlock();
- }
+ return getIoBooleanSupplier(target, prefetch);
+ }
- return () -> {
- currentFrame.loadBlock();
+ private IOBooleanSupplier getIoBooleanSupplier(BytesRef target, boolean
prefetch)
+ throws IOException {
+ boolean doDefer = maybePrefetch(prefetch);
- final SeekStatus result = currentFrame.scanToTerm(target, true);
- if (result == SeekStatus.FOUND) {
- // if (DEBUG) {
- // System.out.println(" return FOUND term=" + term.utf8ToString() +
" " + term);
- // }
- return true;
- } else {
- // if (DEBUG) {
- // System.out.println(" got result " + result + "; return NOT_FOUND
term=" +
- // term.utf8ToString());
- // }
+ return new IOBooleanSupplier() {
+ @Override
+ public boolean get() throws IOException {
+ currentFrame.loadBlock();
- return false;
+ final SeekStatus result = currentFrame.scanToTerm(target, true);
+ if (result == SeekStatus.FOUND) {
+ // if (DEBUG) {
+ // System.out.println(" return FOUND term=" + term.utf8ToString()
+ " " + term);
+ // }
+ return true;
+ } else {
+ // if (DEBUG) {
+ // System.out.println(" got result " + result + "; return
NOT_FOUND term=" +
+ // term.utf8ToString());
+ // }
+
+ return false;
+ }
+ }
+
+ @Override
+ public boolean doDefer() {
+ return doDefer;
}
};
}
+ private boolean maybePrefetch(boolean prefetch) throws IOException {
+ if (indexingMode == IndexingMode.HOT) {
+ return prefetch;
+ }
+ if (indexingMode == IndexingMode.COLD || !prefetch) {
+ return false;
+ }
+
+ boolean doDefer = currentFrame.prefetchBlock();
+ if (doDefer) {
+ hotCounter = 0;
+ } else {
+ hotCounter++;
+ }
+ return doDefer;
+ }
+
@Override
public IOBooleanSupplier prepareSeekExact(BytesRef target) throws
IOException {
- return prepareSeekExact(target, true);
+ return prepareSeekExact(target, likelyCold());
+ }
+
+ private short hotCounter = 0;
+
+ private boolean likelyCold() {
+ switch (indexingMode) {
+ case COLD:
+ return true;
+ case HOT:
+ return false;
+ case ADAPTIVE:
+ default:
+ return hotCounter < 1000;
Review Comment:
I like the exponential implementation of similar functionality in
https://github.com/apache/lucene/blob/25e381bb7a1ab3a70c0965800b09b9b2e60675c0/lucene/core/src/java/org/apache/lucene/store/MemorySegmentIndexInput.java#L319
, can we use it instead of hardcoded threshold?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]