[ https://issues.apache.org/jira/browse/LUCENE-10559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17531972#comment-17531972 ]
Julie Tibshirani commented on LUCENE-10559: ------------------------------------------- Big +1 for this addition to KnnGraphTester. I modified KnnGraphTester on a branch to incorporate a random filter when I ran the experiments here: [https://github.com/apache/lucene/pull/656#issuecomment-1032109021]. It's important for everyone to be able to reproduce those experiments, and it'd be good to add kNN with filtering to luceneutil as well. As an aside, I can imagine scenarios where filters are correlated or anti-correlated with proximity to the query vector. For example, maybe you're looking for a similar-looking product (vector proximity), but in a certain price range (filter). Or you're looking for similar news headlines (vector proximity), but within a certain time range (filter). > Add preFilter/postFilter options to KnnGraphTester > -------------------------------------------------- > > Key: LUCENE-10559 > URL: https://issues.apache.org/jira/browse/LUCENE-10559 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Michael Sokolov > Priority: Major > > We want to be able to test the efficacy of pre-filtering in KnnVectorQuery: > if you (say) want the top K nearest neighbors subject to a constraint Q, are > you better off over-selecting (say 2K) top hits and *then* filtering > (post-filtering), or incorporating the filtering into the query > (pre-filtering). How does it depend on the selectivity of the filter? > I think we can get a reasonable testbed by generating a uniform random filter > with some selectivity (that is consistent and repeatable). Possibly we'd also > want to try filters that are correlated with index order, but it seems they'd > be unlikely to be correlated with vector values in a way that the graph > structure would notice, so random is a pretty good starting point for this. -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org