mayya-sharipova commented on a change in pull request #656: URL: https://github.com/apache/lucene/pull/656#discussion_r807653039
########## File path: lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java ########## @@ -24,19 +24,36 @@ import java.util.Objects; import org.apache.lucene.codecs.KnnVectorsReader; import org.apache.lucene.document.KnnVectorField; +import org.apache.lucene.index.FieldInfo; import org.apache.lucene.index.IndexReader; import org.apache.lucene.index.LeafReaderContext; +import org.apache.lucene.index.VectorSimilarityFunction; +import org.apache.lucene.index.VectorValues; +import org.apache.lucene.util.BitSet; +import org.apache.lucene.util.BitSetIterator; import org.apache.lucene.util.Bits; +import org.apache.lucene.util.FixedBitSet; -/** Uses {@link KnnVectorsReader#search} to perform nearest neighbour search. */ +/** + * Uses {@link KnnVectorsReader#search} to perform nearest neighbour search. + * + * <p>This query also allows for performing a kNN search subject to a filter. In this case, it first + * executes the filter for each leaf, then chooses a strategy dynamically: + * + * <ul> + * <li>If the filter cost is less than k, just execute an exact search + * <li>Otherwise run a kNN search subject to the filter + * <li>the kNN search visits too many vectors without completing, stop and run an exact search Review comment: **if** the KNN search ? ########## File path: lucene/core/src/test/org/apache/lucene/search/TestKnnVectorQuery.java ########## @@ -455,6 +484,61 @@ public void testRandom() throws IOException { } } + /** Tests with random vectors and a random filter. Uses RandomIndexWriter. */ + public void testRandomWithFilter() throws IOException { + int numDocs = 200; + int dimension = atLeast(5); + int numIters = atLeast(10); + try (Directory d = newDirectory()) { + RandomIndexWriter w = new RandomIndexWriter(random(), d); + for (int i = 0; i < numDocs; i++) { + Document doc = new Document(); + doc.add(new KnnVectorField("field", randomVector(dimension))); + doc.add(new NumericDocValuesField("tag", i)); + doc.add(new IntPoint("tag", i)); + w.addDocument(doc); + } + w.close(); + + try (IndexReader reader = DirectoryReader.open(d)) { + IndexSearcher searcher = newSearcher(reader); + for (int i = 0; i < numIters; i++) { + int lower = random().nextInt(50); + + // Check that when filter is restrictive, we use exact search + Query filter = IntPoint.newRangeQuery("tag", lower, lower + 6); + KnnVectorQuery query = new KnnVectorQuery("field", randomVector(dimension), 5, filter); + TopDocs results = searcher.search(query, numDocs); + assertEquals(TotalHits.Relation.EQUAL_TO, results.totalHits.relation); + assertEquals(results.totalHits.value, 5); Review comment: How do we know that we used the exact search? Are we judging by the equality of `results.totalHits.value` and `results.scoreDocs.length`? I guess in most cases this is true. Another idea is always use `TotalHits.Relation.GREATER_THAN_OR_EQUAL_TO` for the approximate search results as returned in `KnnVectorQuery.searchLeaf`: ```java TopDocs results = approximateSearch(ctx, acceptDocs, visitedLimit); if (results.totalHits.relation == TotalHits.Relation.EQUAL_TO) { return <results with Relation.GREATER_THAN_OR_EQUAL_TO>; } else { ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org