[GitHub] [lucene] mayya-sharipova commented on a change in pull request #656: LUCENE-10382: Support filtering in KnnVectorQuery

GitBox Wed, 16 Feb 2022 01:43:38 -0800


mayya-sharipova commented on a change in pull request #656:
URL: https://github.com/apache/lucene/pull/656#discussion_r807653039




##########
File path: lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java
##########
@@ -24,19 +24,36 @@
 import java.util.Objects;
 import org.apache.lucene.codecs.KnnVectorsReader;
 import org.apache.lucene.document.KnnVectorField;
+import org.apache.lucene.index.FieldInfo;
 import org.apache.lucene.index.IndexReader;
 import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.index.VectorSimilarityFunction;
+import org.apache.lucene.index.VectorValues;
+import org.apache.lucene.util.BitSet;
+import org.apache.lucene.util.BitSetIterator;
 import org.apache.lucene.util.Bits;
+import org.apache.lucene.util.FixedBitSet;
 
-/** Uses {@link KnnVectorsReader#search} to perform nearest neighbour search. 
*/
+/**
+ * Uses {@link KnnVectorsReader#search} to perform nearest neighbour search.
+ *
+ * <p>This query also allows for performing a kNN search subject to a filter. 
In this case, it first
+ * executes the filter for each leaf, then chooses a strategy dynamically:
+ *
+ * <ul>
+ *   <li>If the filter cost is less than k, just execute an exact search
+ *   <li>Otherwise run a kNN search subject to the filter
+ *   <li>the kNN search visits too many vectors without completing, stop and 
run an exact search

Review comment:
       **if** the KNN search ?

##########
File path: lucene/core/src/test/org/apache/lucene/search/TestKnnVectorQuery.java
##########
@@ -455,6 +484,61 @@ public void testRandom() throws IOException {
     }
   }
 
+  /** Tests with random vectors and a random filter. Uses RandomIndexWriter. */
+  public void testRandomWithFilter() throws IOException {
+    int numDocs = 200;
+    int dimension = atLeast(5);
+    int numIters = atLeast(10);
+    try (Directory d = newDirectory()) {
+      RandomIndexWriter w = new RandomIndexWriter(random(), d);
+      for (int i = 0; i < numDocs; i++) {
+        Document doc = new Document();
+        doc.add(new KnnVectorField("field", randomVector(dimension)));
+        doc.add(new NumericDocValuesField("tag", i));
+        doc.add(new IntPoint("tag", i));
+        w.addDocument(doc);
+      }
+      w.close();
+
+      try (IndexReader reader = DirectoryReader.open(d)) {
+        IndexSearcher searcher = newSearcher(reader);
+        for (int i = 0; i < numIters; i++) {
+          int lower = random().nextInt(50);
+
+          // Check that when filter is restrictive, we use exact search
+          Query filter = IntPoint.newRangeQuery("tag", lower, lower + 6);
+          KnnVectorQuery query = new KnnVectorQuery("field", 
randomVector(dimension), 5, filter);
+          TopDocs results = searcher.search(query, numDocs);
+          assertEquals(TotalHits.Relation.EQUAL_TO, 
results.totalHits.relation);
+          assertEquals(results.totalHits.value, 5);

Review comment:
       How do we know that we used the exact search?  Are we judging by the 
equality of `results.totalHits.value` and `results.scoreDocs.length`?  I guess 
in most cases this is true.
   
   Another idea is always use `TotalHits.Relation.GREATER_THAN_OR_EQUAL_TO` for 
the approximate search results as returned in `KnnVectorQuery.searchLeaf`:
   ```java
   TopDocs results = approximateSearch(ctx, acceptDocs, visitedLimit);
         if (results.totalHits.relation == TotalHits.Relation.EQUAL_TO) {
           return <results with Relation.GREATER_THAN_OR_EQUAL_TO>;
         } else {
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mayya-sharipova commented on a change in pull request #656: LUCENE-10382: Support filtering in KnnVectorQuery

Reply via email to