[GitHub] [lucene] jpountz commented on a change in pull request #736: LUCENE-10458: BoundedDocSetIdIterator may supply error count in Weigth#count(LeafReaderContext) when missingValue enables

GitBox Mon, 21 Mar 2022 03:28:05 -0700


jpountz commented on a change in pull request #736:
URL: https://github.com/apache/lucene/pull/736#discussion_r827810798




##########
File path: 
lucene/sandbox/src/java/org/apache/lucene/sandbox/search/IndexSortSortedNumericDocValuesRangeQuery.java
##########
@@ -198,16 +200,23 @@ public boolean isCacheable(LeafReaderContext ctx) {
 
       @Override
       public int count(LeafReaderContext context) throws IOException {
-        BoundedDocSetIdIterator disi = getDocIdSetIteratorOrNull(context);
-        if (disi != null) {
+
+        BoundedDocIdSetIterator disi = getDocIdSetIteratorOrNull(context);
+        if (disi != null && disi.delegate == null) {
           return disi.lastDoc - disi.firstDoc;
         }
-        return fallbackWeight.count(context);
+
+        PointValues pointValues = context.reader().getPointValues(field);
+        // if no point indexed, PointRangeQuery#count equals 0, that's 
sometimes not what we want.
+        if (pointValues != null) {
+          return fallbackWeight.count(context);
+        }

Review comment:
       I don't think we should do this, this query isn't making expectations 
about what `fallbackWeight` actually does. It's the responsibility of the 
caller to make sure to pass a fallback weight that makes sense. Let's not check 
points and just return `fallbackWeight.count(context)` as before?

##########
File path: 
lucene/sandbox/src/java/org/apache/lucene/sandbox/search/IndexSortSortedNumericDocValuesRangeQuery.java
##########
@@ -306,14 +329,14 @@ private static ValueComparator loadComparator(
    * A doc ID set iterator that wraps a delegate iterator and only returns doc 
IDs in the range
    * [firstDocInclusive, lastDoc).
    */
-  private static class BoundedDocSetIdIterator extends DocIdSetIterator {
-    private final int firstDoc;
-    private final int lastDoc;
+  private static class BoundedDocIdSetIterator extends DocIdSetIterator {
+    protected final int firstDoc;
+    protected final int lastDoc;
     private final DocIdSetIterator delegate;
 
-    private int docID = -1;
+    protected int docID = -1;

Review comment:
       and here too?

##########
File path: 
lucene/sandbox/src/java/org/apache/lucene/sandbox/search/IndexSortSortedNumericDocValuesRangeQuery.java
##########
@@ -278,7 +280,21 @@ private BoundedDocSetIdIterator getDocIdSetIterator(
     }
 
     int lastDocIdExclusive = high + 1;
-    return new BoundedDocSetIdIterator(firstDocIdInclusive, 
lastDocIdExclusive, delegate);
+    Object missingValue = sortField.getMissingValue();
+    BoundedDocIdSetIterator disi;
+    LeafReader reader = context.reader();
+    PointValues pointValues = reader.getPointValues(field);
+    final long missingLongValue = missingValue == null ? 0L : (long) 
missingValue;
+    // all documents have docValues or missing value falls outside the range
+    if ((reader.hasDeletions() == false

Review comment:
       Can you mave this `hasDeletions` check inside `count` instead as 
@jtibshirani suggested? It would still be ok to set the delegate to `null` for 
`scorer` if there are deletions since deletes are applied on top of the 
`Scorer` API.

##########
File path: 
lucene/sandbox/src/test/org/apache/lucene/sandbox/search/TestIndexSortSortedNumericDocValuesRangeQuery.java
##########
@@ -59,7 +61,14 @@ public void testSameHitsAsPointRangeQuery() throws 
IOException {
       IndexWriterConfig iwc = new IndexWriterConfig(new 
MockAnalyzer(random()));
       boolean reverse = random().nextBoolean();
       SortField sortField = new SortedNumericSortField("dv", 
SortField.Type.LONG, reverse);
-      sortField.setMissingValue(random().nextLong());
+      boolean enableMissingValue = random().nextBoolean();
+      if (enableMissingValue) {
+        long missingValue =
+            random().nextBoolean()
+                ? random().nextLong()

Review comment:
       The problem with a random long is that almost all the time it would fall 
outside of the query ranges that we test below, so we would have no coverage 
for the case when the missing value is set and falls within the query range.
   
   Can you use a random long between -100 and 10,000 like other values or 
something like that instead?

##########
File path: 
lucene/sandbox/src/java/org/apache/lucene/sandbox/search/IndexSortSortedNumericDocValuesRangeQuery.java
##########
@@ -306,14 +329,14 @@ private static ValueComparator loadComparator(
    * A doc ID set iterator that wraps a delegate iterator and only returns doc 
IDs in the range
    * [firstDocInclusive, lastDoc).
    */
-  private static class BoundedDocSetIdIterator extends DocIdSetIterator {
-    private final int firstDoc;
-    private final int lastDoc;
+  private static class BoundedDocIdSetIterator extends DocIdSetIterator {
+    protected final int firstDoc;
+    protected final int lastDoc;

Review comment:
       let's keep these private?

##########
File path: 
lucene/sandbox/src/java/org/apache/lucene/sandbox/search/IndexSortSortedNumericDocValuesRangeQuery.java
##########
@@ -278,7 +287,21 @@ private BoundedDocSetIdIterator getDocIdSetIterator(
     }
 
     int lastDocIdExclusive = high + 1;
-    return new BoundedDocSetIdIterator(firstDocIdInclusive, 
lastDocIdExclusive, delegate);
+    Object missingValue = sortField.getMissingValue();
+    BoundedDocIdSetIterator disi;
+    LeafReader reader = context.reader();
+    PointValues pointValues = reader.getPointValues(field);
+    final long missingLongValue = missingValue == null ? 0L : (long) 
missingValue;
+    if (reader.hasDeletions() == false

Review comment:
       We shouldn't check `hasDeletions` here, only in `Weight#count`.
   
   The reason is that the only thing we care about here is whether all 
documents have a value or not, whether these documents are deleted or not is 
only a concern for `Weight#count`.

##########
File path: 
lucene/sandbox/src/test/org/apache/lucene/sandbox/search/TestIndexSortSortedNumericDocValuesRangeQuery.java
##########
@@ -471,44 +480,134 @@ public void testCount() throws IOException {
     IndexReader reader = writer.getReader();
     IndexSearcher searcher = newSearcher(reader);
 
-    Query fallbackQuery = LongPoint.newRangeQuery("field", 1, 42);
-    Query query = new IndexSortSortedNumericDocValuesRangeQuery("field", 1, 
42, fallbackQuery);
+    // we use an unrealistic query that exposes its own Weight#count
+    Query fallbackQuery = new MatchNoDocsQuery();
+    // the index is not sorted on this field, the fallback query is used
+    Query query = new IndexSortSortedNumericDocValuesRangeQuery("another", 1, 
42, fallbackQuery);
     Weight weight = query.createWeight(searcher, ScoreMode.COMPLETE, 1.0f);
     for (LeafReaderContext context : searcher.getLeafContexts()) {
-      assertEquals(1, weight.count(context));
+      assertEquals(0, weight.count(context));
     }
 
     writer.close();
     reader.close();
     dir.close();
   }
 
-  public void testFallbackCount() throws IOException {
+  public void testCompareCount() throws IOException {
+    final int iters = atLeast(10);
+    for (int iter = 0; iter < iters; ++iter) {
+      Directory dir = newDirectory();
+      IndexWriterConfig iwc = new IndexWriterConfig(new 
MockAnalyzer(random()));
+      SortField sortField = new SortedNumericSortField("field", 
SortField.Type.LONG);
+      boolean enableMissingValue = random().nextBoolean();
+      if (enableMissingValue) {
+        long missingValue =
+            random().nextBoolean()
+                ? random().nextLong()
+                : (random().nextBoolean() ? Long.MIN_VALUE : Long.MAX_VALUE);
+        sortField.setMissingValue(missingValue);
+      }
+      RandomIndexWriter writer = new RandomIndexWriter(random(), dir, iwc);
+
+      final int numDocs = atLeast(100);
+      for (int i = 0; i < numDocs; ++i) {
+        Document doc = new Document();
+        final int numValues = TestUtil.nextInt(random(), 0, 1);
+        for (int j = 0; j < numValues; ++j) {
+          final long value = TestUtil.nextLong(random(), -100, 10000);
+          doc =
+              random().nextBoolean()
+                  ? createMissingValueDocument()

Review comment:
       Hmm, I don't understand this, are we not already creating a document 
that has a missing value when `numValues` is equal to 0?

##########
File path: 
lucene/sandbox/src/test/org/apache/lucene/sandbox/search/TestIndexSortSortedNumericDocValuesRangeQuery.java
##########
@@ -471,44 +480,134 @@ public void testCount() throws IOException {
     IndexReader reader = writer.getReader();
     IndexSearcher searcher = newSearcher(reader);
 
-    Query fallbackQuery = LongPoint.newRangeQuery("field", 1, 42);
-    Query query = new IndexSortSortedNumericDocValuesRangeQuery("field", 1, 
42, fallbackQuery);
+    // we use an unrealistic query that exposes its own Weight#count
+    Query fallbackQuery = new MatchNoDocsQuery();
+    // the index is not sorted on this field, the fallback query is used
+    Query query = new IndexSortSortedNumericDocValuesRangeQuery("another", 1, 
42, fallbackQuery);
     Weight weight = query.createWeight(searcher, ScoreMode.COMPLETE, 1.0f);
     for (LeafReaderContext context : searcher.getLeafContexts()) {
-      assertEquals(1, weight.count(context));
+      assertEquals(0, weight.count(context));
     }
 
     writer.close();
     reader.close();
     dir.close();
   }
 
-  public void testFallbackCount() throws IOException {
+  public void testCompareCount() throws IOException {
+    final int iters = atLeast(10);
+    for (int iter = 0; iter < iters; ++iter) {
+      Directory dir = newDirectory();
+      IndexWriterConfig iwc = new IndexWriterConfig(new 
MockAnalyzer(random()));
+      SortField sortField = new SortedNumericSortField("field", 
SortField.Type.LONG);
+      boolean enableMissingValue = random().nextBoolean();
+      if (enableMissingValue) {
+        long missingValue =
+            random().nextBoolean()
+                ? random().nextLong()

Review comment:
       and likewise here




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on a change in pull request #736: LUCENE-10458: BoundedDocSetIdIterator may supply error count in Weigth#count(LeafReaderContext) when missingValue enables

Reply via email to