[ 
https://issues.apache.org/jira/browse/SOLR-17928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18034227#comment-18034227
 ] 

Chris M. Hostetter commented on SOLR-17928:
-------------------------------------------

{quote}Previously, improving accuracy required increasing topK (which returns 
more results), but efSearch enables exploring more candidates while still 
receiving exactly topK results. And default efSearch is 2*topK.
{quote}
I understand how it works – my concern is that adding this new option with a 
default of "topK * 2" means that for any _existing_ query (that isn't modified 
in advance to specify an {{efSearch}} param) upgrading after this feature is 
added is going to do twice as much "work" – effectively doubling the graph walk 
time of the query. (correct?)

Based on my interaction with heavy KNN users, that is likely to 
surprise/confuse/frustrate a lot of people – because they have already tuned 
their knn queries to use a carefully chosen topK value (they pass to solr) 
based on how well it impacts the relevancy of the "top N < K" results they 
_actually_ look at in the response.

Example: they only care about the top N=100 results, but they pass topK=500 to 
Solr because:
 * when they tried topK=100 the results were faster, but not accurate enough to 
be useful
 * when they tried topK=1000 the results were even more accurate, but too slow 
to be worth the added improvements

 

An {{efSearch}} default of "topK * 2" would likely improve the relevancy of 
existing queries, but it would effectively be a "performance backcompat break" 
for existing users who upgrade.

*That's what i'm hung up on:  whether that is a good idea?*

 
----
 

In the latest patch, the user gets an error if {{efSearch < topK}} ; which 
makes sense at a low level – but seems like it might be error prone if/when 
users are turning their query params -- especially combined with a default 
{{efSearch}} that is _relative_ to the {{topK}} value:

* If i _only_ set the {{topK=K}} param, and i increase the value of "K", then i 
not only increase the amount of results matched, but also the amount of graph 
walking done.
* If i set _both_ {{topK=K efSearch=X}}, and i increase the value of "K", then 
I _only_ increase the amount of results matched, the amount of graph walking 
_stays the same_ (and my overall relevancy effectively decreases)

I'm wondering if (instead of an "integer" {{efSearch >= topK}} param) it would 
make more sense to have a "float" {{efSearchFactor >= 1.0}} param that would be 
multiplied by the {{topK}} param to determine the effective {{efSearch}} value 
used internally in the code.
 
That way:
* Changing (only) {{topK}} impacts the number of docs matched by the query, 
keeping the _relative_ amount of graph walking the same (regardless of whether 
you have an explicit {{efSearchFactor}} param or use the default)
*  if you do have an {{efSearchFacter}} param set, changing it can tune the 
amount of graph walking done, _relative_ to the {{topK}} value you use, in a 
way that will scale if/when you change {{topK}}
 
?

----

{quote}ElasticSearch also has a similar parameter called num_candidates which 
achieves something similar, and they default to 1.5*topK
{quote}
I really don't care what ES does or what their defaults are – I care what makes 
the most sense for (existing & new) Solr users

 

> Add efSearch parameter to knn query
> -----------------------------------
>
>                 Key: SOLR-17928
>                 URL: https://issues.apache.org/jira/browse/SOLR-17928
>             Project: Solr
>          Issue Type: Improvement
>          Components: vector-search
>            Reporter: Ishan Chattopadhyaya
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Right now, only topK can be requested. efSearch is a standard overfetch 
> parameter.
> Proposing that we add it for better recall accuracy.
> (FYI, Elasticsearch calls it num_candidates. Commonly referred to as 
> efSearch, similar to efConstruction that we call beamWidth)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to