Gabriel Magno created SOLR-16651:
------------------------------------
Summary: Optimize execution of KNN sub-query to apply it only on
documents remaining after the main query
Key: SOLR-16651
URL: https://issues.apache.org/jira/browse/SOLR-16651
Project: Solr
Issue Type: Improvement
Security Level: Public (Default Security Level. Issues are Public)
Components: query
Affects Versions: 9.1.1
Reporter: Gabriel Magno
Solr 9.1 introduced pre-filtering for KNN queries, which is great and is
working fine when the KNN is the main query.
I was wondering rather it would be possible to make something similar, but for
the case of KNN being a sub-query instead of the main query (q). Let me show an
example use case with the films example.
I want to query for films with “the” in the name, and filter only films with
genre “Drama”, then calculate the similarity of these films vectors according
to my target vector. The idea is making a simple lexical query, and using the
KNN sub-query to calculate similarities (not really sorting by the similarity
necessarily). Here is an example query:
* URL:
[http://localhost:8983/solr/#/films/query?q=name:the&fq=genre:Drama&my_similarity=%7B!knn%20f%3Dfilm_vector%20topK%3D10000%7D%5B0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0%5D&fl=*,$my_similarity]
* Params:
** {*}q{*}=name:the
** {*}fq{*}=genre:Drama
** {*}my_similarity{*}=\{!knn f=film_vector
topK=10000}[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0]
** {*}fl{*}=*,$my_similarity
This query works fine, the problem is that the `my_similarity` subquery runs
for all of the 1,100 film documents, instead of running only for the 51 that
are relevant for the query. For a small collection like this it does not make a
difference, but I have a collection with 12 million documents that makes
queries similar like this to run very slow, even tough the retrieval being
small.
I tried using the cache and cost parameters to "force" the KNN sub-query
running after the main query (`\{!knn cache=false cost=101 f=film_vector
topK=10000}[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0]`), but it does not work (I
guess the PostFilter is not implemented for KNN).
This issue might be related to the fix of the StackOverflow bug of frange with
KNN (https://issues.apache.org/jira/browse/SOLR-16567).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]