[ 
https://issues.apache.org/jira/browse/LUCENE-9614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17396283#comment-17396283
 ] 

Michael Sokolov commented on LUCENE-9614:
-----------------------------------------

Thinking about how to make the scores be commensurate across different indexes 
for the same query ... in the case of dot product there's no issue since we 
assume all vectors are unit length (otherwise the dot-product similarity makes 
no sense), scores are always between 0 and 1 and there is no need for inversion 
or normalization. For the Euclidean distance, because we invert the scores to 
negative in order to sort descending, we need some way to normalize to make 
them non-negative.

And -- it's not really clear at all how to control the range of scores from 
this query given the typical use case of a boolean query disjunctively 
combining "semantic" matches from HNSW with "keyword" matches from term 
queries. Ideally we'd return scores in a fixed range (0 - 1) and let the query 
writer control the balance between keyword and semantic queries with the boost.

Possibly for these L2-normed queries, we can use something like {{score(q, d) = 
1 - |q - d| / (|q| + |d|)}}. Then as {{d -> 0}} or {{d -> ∞}}, the score 
approaches 0, and score = 1 when q = d.

> Implement KNN Query
> -------------------
>
>                 Key: LUCENE-9614
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9614
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Michael Sokolov
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Now we have a vector index format, and one vector indexing/KNN search 
> implementation, but the interface is low-level: you can search across a 
> single segment only. We would like to expose a Query implementation. 
> Initially, we want to support a usage where the KnnVectorQuery selects the 
> k-nearest neighbors without regard to any other constraints, and these can 
> then be filtered as part of an enclosing Boolean or other query.
> Later we will want to explore some kind of filtering *while* performing 
> vector search, or a re-entrant search process that can yield further results. 
> Because of the nature of knn search (all documents having any vector value 
> match), it is more like a ranking than a filtering operation, and it doesn't 
> really make sense to provide an iterator interface that can be merged in the 
> usual way, in docid order, skipping ahead. It's not yet clear how to satisfy 
> a query that is "k nearest neighbors satsifying some arbitrary Query", at 
> least not without realizing a complete bitset for the Query. But this is for 
> a later issue; *this* issue is just about performing the knn search in 
> isolation, computing a set of (some given) K nearest neighbors, and providing 
> an iterator over those.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to