[
https://issues.apache.org/jira/browse/SOLR-16651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Christine Poerschke updated SOLR-16651:
---------------------------------------
Security: (was: Public)
> Optimize execution of KNN sub-query to apply it only on documents remaining
> after the main query
> ------------------------------------------------------------------------------------------------
>
> Key: SOLR-16651
> URL: https://issues.apache.org/jira/browse/SOLR-16651
> Project: Solr
> Issue Type: Improvement
> Components: query
> Affects Versions: 9.1.1
> Reporter: Gabriel Magno
> Priority: Major
> Labels: knn, optimization, query, vector
>
> Solr 9.1 introduced pre-filtering for KNN queries, which is great and is
> working fine when the KNN is the main query.
> I was wondering rather it would be possible to make something similar, but
> for the case of KNN being a sub-query instead of the main query (q). Let me
> show an example use case with the films example.
> I want to query for films with “the” in the name, and filter only films with
> genre “Drama”, then calculate the similarity of these films vectors according
> to my target vector. The idea is making a simple lexical query, and using the
> KNN sub-query to calculate similarities (not really sorting by the similarity
> necessarily). Here is an example query:
> * URL:
> [http://localhost:8983/solr/#/films/query?q=name:the&fq=genre:Drama&my_similarity=%7B!knn%20f%3Dfilm_vector%20topK%3D10000%7D%5B0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0%5D&fl=*,$my_similarity]
> * Params:
> ** {*}q{*}=name:the
> ** {*}fq{*}=genre:Drama
> ** {*}my_similarity{*}=\{!knn f=film_vector
> topK=10000}[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0]
> ** {*}fl{*}=*,$my_similarity
> This query works fine, the problem is that the `my_similarity` subquery runs
> for all of the 1,100 film documents, instead of running only for the 51 that
> are relevant for the query. For a small collection like this it does not make
> a difference, but I have a collection with 12 million documents that makes
> queries similar like this to run very slow, even tough the retrieval being
> small.
> I tried using the cache and cost parameters to "force" the KNN sub-query
> running after the main query (`\{!knn cache=false cost=101 f=film_vector
> topK=10000}[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0]`), but it does not work
> (I guess the PostFilter is not implemented for KNN).
> This issue might be related to the fix of the StackOverflow bug of frange
> with KNN (https://issues.apache.org/jira/browse/SOLR-16567).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]