[jira] [Updated] (SOLR-16651) Optimize execution of KNN sub-query to apply it only on documents remaining after the main query

Christine Poerschke (Jira) Thu, 23 Feb 2023 10:31:57 -0800


     [ 
https://issues.apache.org/jira/browse/SOLR-16651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Christine Poerschke updated SOLR-16651:
---------------------------------------
    Security:     (was: Public)

> Optimize execution of KNN sub-query to apply it only on documents remaining 
> after the main query
> ------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-16651
>                 URL: https://issues.apache.org/jira/browse/SOLR-16651
>             Project: Solr
>          Issue Type: Improvement
>          Components: query
>    Affects Versions: 9.1.1
>            Reporter: Gabriel Magno
>            Priority: Major
>              Labels: knn, optimization, query, vector
>
> Solr 9.1 introduced pre-filtering for KNN queries, which is great and is 
> working fine when the KNN is the main query.
> I was wondering rather it would be possible to make something similar, but 
> for the case of KNN being a sub-query instead of the main query (q). Let me 
> show an example use case with the films example.
> I want to query for films with “the” in the name, and filter only films with 
> genre “Drama”, then calculate the similarity of these films vectors according 
> to my target vector. The idea is making a simple lexical query, and using the 
> KNN sub-query to calculate similarities (not really sorting by the similarity 
> necessarily). Here is an example query:
>  * URL: 
> [http://localhost:8983/solr/#/films/query?q=name:the&fq=genre:Drama&my_similarity=%7B!knn%20f%3Dfilm_vector%20topK%3D10000%7D%5B0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0%5D&fl=*,$my_similarity]
>  * Params:
>  ** {*}q{*}=name:the
>  ** {*}fq{*}=genre:Drama
>  ** {*}my_similarity{*}=\{!knn f=film_vector 
> topK=10000}[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0]
>  ** {*}fl{*}=*,$my_similarity
> This query works fine, the problem is that the `my_similarity` subquery runs 
> for all of the 1,100 film documents, instead of running only for the 51 that 
> are relevant for the query. For a small collection like this it does not make 
> a difference, but I have a collection with 12 million documents that makes 
> queries similar like this to run very slow, even tough the retrieval being 
> small.
> I tried using the cache and cost parameters to "force" the KNN sub-query 
> running after the main query (`\{!knn cache=false cost=101 f=film_vector 
> topK=10000}[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0]`), but it does not work 
> (I guess the PostFilter is not implemented for KNN).
> This issue might be related to the fix of the StackOverflow bug of frange 
> with KNN (https://issues.apache.org/jira/browse/SOLR-16567).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SOLR-16651) Optimize execution of KNN sub-query to apply it only on documents remaining after the main query

Reply via email to