[GitHub] [lucene] rmuir commented on a diff in pull request #12055: Better skipping for multi-term queries with a FILTER rewrite.

GitBox Sun, 01 Jan 2023 15:23:27 -0800


rmuir commented on code in PR #12055:
URL: https://github.com/apache/lucene/pull/12055#discussion_r1059804843



##########
lucene/core/src/java/org/apache/lucene/search/MultiTermQueryConstantScoreWrapper.java:
##########
@@ -183,23 +182,31 @@ private WeightOrDocIdSet rewrite(LeafReaderContext 
context) throws IOException {
           }
           Query q = new ConstantScoreQuery(bq.build());
           final Weight weight = searcher.rewrite(q).createWeight(searcher, 
scoreMode, score());
-          return new WeightOrDocIdSet(weight);
+          return new WeightOrDocIdSetIterator(weight);
         }
 
         // Too many terms: go back to the terms we already collected and start 
building the bit set
-        DocIdSetBuilder builder = new 
DocIdSetBuilder(context.reader().maxDoc(), terms);
+        PriorityQueue<PostingsEnum> highFrequencyTerms =
+            new PriorityQueue<PostingsEnum>(collectedTerms.size()) {
+              @Override
+              protected boolean lessThan(PostingsEnum a, PostingsEnum b) {
+                return a.cost() < b.cost();

Review Comment:
   `pq.insertWithOverflow` uses `!lessThan()` in its code. So I'm worried about 
this PQ behaving stupidly on ties with the same `docFreq`.
   
   Is there a simple tiebreaker we can use (even synthetic such as `int 
termId`) so that such ties don't enter the PQ? I'm just concerned about 
"collect remaining terms" piece for cases where there are jazillions of terms. 
should also allow the IO to be a bit more sequential in such cases, rather than 
constantly replacing top of PQ with more ties?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] rmuir commented on a diff in pull request #12055: Better skipping for multi-term queries with a FILTER rewrite.

Reply via email to