[GitHub] [lucene] jtibshirani commented on a change in pull request #656: LUCENE-10382: Support filtering in KnnVectorQuery

GitBox Thu, 10 Feb 2022 09:54:35 -0800


jtibshirani commented on a change in pull request #656:
URL: https://github.com/apache/lucene/pull/656#discussion_r803950304




##########
File path: lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java
##########
@@ -96,43 +107,98 @@ public Query rewrite(IndexReader reader) throws 
IOException {
     return createRewrittenQuery(reader, topK);
   }
 
-  private TopDocs searchLeaf(LeafReaderContext ctx, int kPerLeaf, Bits 
bitsFilter)
+  private TopDocs searchLeaf(LeafReaderContext ctx, int kPerLeaf, 
BitSetCollector filterCollector)
       throws IOException {
-    // If the filter is non-null, then it already handles live docs
-    if (bitsFilter == null) {
-      bitsFilter = ctx.reader().getLiveDocs();
+
+    if (filterCollector == null) {
+      Bits acceptDocs = ctx.reader().getLiveDocs();
+      return ctx.reader()
+          .searchNearestVectors(field, target, kPerLeaf, acceptDocs, 
Integer.MAX_VALUE);
+    } else {
+      BitSetIterator filterIterator = filterCollector.getIterator(ctx.ord);
+      if (filterIterator == null || filterIterator.cost() == 0) {
+        return NO_RESULTS;
+      }
+
+      if (filterIterator.cost() <= k) {
+        // If there <= k possible matches, short-circuit and perform exact 
search, since HNSW must
+        // always visit at least k documents
+        return exactSearch(ctx, target, k, filterIterator);
+      }
+
+      try {
+        // The filter iterator already incorporates live docs
+        Bits acceptDocs = filterIterator.getBitSet();
+        int visitedLimit = (int) filterIterator.cost();
+        return ctx.reader().searchNearestVectors(field, target, kPerLeaf, 
acceptDocs, visitedLimit);
+      } catch (
+          @SuppressWarnings("unused")
+          CollectionTerminatedException e) {
+        // We stopped the kNN search because it visited too many nodes, so 
fall back to exact search
+        return exactSearch(ctx, target, k, filterIterator);
+      }
     }
+  }
 
-    TopDocs results = ctx.reader().searchNearestVectors(field, target, 
kPerLeaf, bitsFilter);
-    if (results == null) {
+  private TopDocs exactSearch(
+      LeafReaderContext context, float[] target, int k, DocIdSetIterator 
acceptIterator)
+      throws IOException {
+    FieldInfo fi = context.reader().getFieldInfos().fieldInfo(field);
+    if (fi == null || fi.getVectorDimension() == 0) {
+      // The field does not exist or does not index vectors
       return NO_RESULTS;
     }
-    if (ctx.docBase > 0) {
-      for (ScoreDoc scoreDoc : results.scoreDocs) {
-        scoreDoc.doc += ctx.docBase;
-      }
+
+    VectorSimilarityFunction similarityFunction = 
fi.getVectorSimilarityFunction();
+    VectorValues vectorValues = context.reader().getVectorValues(field);
+
+    HitQueue queue = new HitQueue(k, false);

Review comment:
       Oh this is good to know about, I'll try to switch over.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jtibshirani commented on a change in pull request #656: LUCENE-10382: Support filtering in KnnVectorQuery

Reply via email to