[GitHub] [lucene] gsmiller commented on a change in pull request #127: LUCENE-9946: Support multi-value fields in range facet counting

GitBox Thu, 06 May 2021 07:28:19 -0700


gsmiller commented on a change in pull request #127:
URL: https://github.com/apache/lucene/pull/127#discussion_r627471783




##########
File path: 
lucene/facet/src/java/org/apache/lucene/facet/range/DoubleRangeFacetCounts.java
##########
@@ -79,76 +94,150 @@ public DoubleRangeFacetCounts(
       DoubleRange... ranges)
       throws IOException {
     super(field, ranges, fastMatchQuery);
-    count(valueSource, hits.getMatchingDocs());
+    // use the provided valueSource if non-null, otherwise use the doc values 
associated with the
+    // field
+    if (valueSource != null) {
+      count(valueSource, hits.getMatchingDocs());
+    } else {
+      count(field, hits.getMatchingDocs());
+    }
   }
 
+  /** Counts from the provided valueSource. */
   private void count(DoubleValuesSource valueSource, List<MatchingDocs> 
matchingDocs)
       throws IOException {
 
-    DoubleRange[] ranges = (DoubleRange[]) this.ranges;
-
-    LongRange[] longRanges = new LongRange[ranges.length];
-    for (int i = 0; i < ranges.length; i++) {
-      DoubleRange range = ranges[i];
-      longRanges[i] =
-          new LongRange(
-              range.label,
-              NumericUtils.doubleToSortableLong(range.min),
-              true,
-              NumericUtils.doubleToSortableLong(range.max),
-              true);
-    }
+    LongRange[] longRanges = createLongRanges();
 
-    LongRangeCounter counter = new LongRangeCounter(longRanges);
+    LongRangeCounter counter = new LongRangeCounter(longRanges, counts, false);
 
     int missingCount = 0;
     for (MatchingDocs hits : matchingDocs) {
       DoubleValues fv = valueSource.getValues(hits.context, null);
-
       totCount += hits.totalHits;
-      final DocIdSetIterator fastMatchDocs;
+
+      final DocIdSetIterator it;
       if (fastMatchQuery != null) {
-        final IndexReaderContext topLevelContext = 
ReaderUtil.getTopLevelContext(hits.context);
-        final IndexSearcher searcher = new IndexSearcher(topLevelContext);
-        searcher.setQueryCache(null);
-        final Weight fastMatchWeight =
-            searcher.createWeight(
-                searcher.rewrite(fastMatchQuery), 
ScoreMode.COMPLETE_NO_SCORES, 1);
-        Scorer s = fastMatchWeight.scorer(hits.context);
-        if (s == null) {
+        DocIdSetIterator fastMatchDocs = createFastMatchDisi(hits.context);
+        if (fastMatchDocs == null) {
           continue;
+        } else {
+          it =
+              ConjunctionDISI.intersectIterators(
+                  Arrays.asList(hits.bits.iterator(), fastMatchDocs));
         }
-        fastMatchDocs = s.iterator();
       } else {
-        fastMatchDocs = null;
+        it = hits.bits.iterator();
       }
 
-      DocIdSetIterator docs = hits.bits.iterator();
-
-      for (int doc = docs.nextDoc(); doc != DocIdSetIterator.NO_MORE_DOCS; ) {
-        if (fastMatchDocs != null) {
-          int fastMatchDoc = fastMatchDocs.docID();
-          if (fastMatchDoc < doc) {
-            fastMatchDoc = fastMatchDocs.advance(doc);
-          }
-
-          if (doc != fastMatchDoc) {
-            doc = docs.advance(fastMatchDoc);
-            continue;
-          }
-        }
+      for (int doc = it.nextDoc(); doc != DocIdSetIterator.NO_MORE_DOCS; ) {
         // Skip missing docs:
         if (fv.advanceExact(doc)) {
           counter.add(NumericUtils.doubleToSortableLong(fv.doubleValue()));
         } else {
           missingCount++;
         }
 
-        doc = docs.nextDoc();
+        doc = it.nextDoc();
       }
     }
 
-    missingCount += counter.fillCounts(counts);
+    missingCount += counter.finish();
     totCount -= missingCount;
   }
+
+  /** Counts from the provided field. */
+  private void count(String field, List<MatchingDocs> matchingDocs) throws 
IOException {
+
+    LongRange[] longRanges = createLongRanges();
+
+    LongRangeCounter counter = null;
+
+    int missingCount = 0;
+
+    for (MatchingDocs hits : matchingDocs) {
+
+      final DocIdSetIterator it;
+      if (fastMatchQuery != null) {
+        DocIdSetIterator fastMatchDocs = createFastMatchDisi(hits.context);
+        if (fastMatchDocs == null) {
+          continue;
+        } else {
+          it =
+              ConjunctionDISI.intersectIterators(
+                  Arrays.asList(hits.bits.iterator(), fastMatchDocs));
+        }
+      } else {
+        it = hits.bits.iterator();
+      }
+
+      SortedNumericDocValues multiValues = 
DocValues.getSortedNumeric(hits.context.reader(), field);
+      NumericDocValues singleValues = DocValues.unwrapSingleton(multiValues);
+
+      if (singleValues != null) {
+
+        if (counter == null) {
+          counter = new LongRangeCounter(longRanges, counts, false);
+        }
+        assert counter.isMultiValued == false;

Review comment:
       @rmuir do you know if this is actually a valid assertion? Is it possible 
that different index segments could behave differently (e.g., some provide 
single-valued DVs while others provide multi-valued)? You mentioned that Lucene 
may optimize cases where the user specifies multi-valued but only ever provides 
single values. Could that come into play here (e.g., one index shard optimizes 
to single-valued while others are multi-valued)?
   
   If this is something I need to worry about, I'll just load the DVs up-front 
and use a multi-valued counter if any of them are multi-valued.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] gsmiller commented on a change in pull request #127: LUCENE-9946: Support multi-value fields in range facet counting

Reply via email to