[ 
https://issues.apache.org/jira/browse/LUCENE-10593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17542677#comment-17542677
 ] 

Alessandro Benedetti commented on LUCENE-10593:
-----------------------------------------------

https://github.com/apache/lucene/pull/926 has been opened, [~sokolov], 
[~mayya], [~julietibs] [~jpountz] feel free to review

> VectorSimilarityFunction reverse removal
> ----------------------------------------
>
>                 Key: LUCENE-10593
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10593
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Alessandro Benedetti
>            Priority: Major
>              Labels: vector-based-search
>
> org.apache.lucene.index.VectorSimilarityFunction#EUCLIDEAN similarity behaves 
> in an opposite way in comparison to the other similarities:
> A higher similarity score means higher distance, for this reason, has been 
> marked with "reversed" and a function is present to map from the similarity 
> to a score (where higher means closer, like in all other similarities.)
> Having this counterintuitive behavior with no apparent explanation I could 
> find(please correct me if I am wrong) brings a lot of nasty side effects for 
> the code readability, especially when combined with the NeighbourQueue that 
> has a "reversed" itself.
> In addition, it complicates also the usage of the pattern:
> Result Queue -> MIN HEAP
> Candidate Queue -> MAX HEAP
> In HNSW searchers.
> The proposal in my Pull Request aims to:
> 1) the Euclidean similarity just returns the score, in line with the other 
> similarities, with the formula currently used to move from distance to score
> 2) simplify the code, removing the bound checker that's not necessary anymore
> 3) refactor here and there to be in line with the simplification
> 4) refactor of NeighborQueue to clearly state when it's a MIN_HEAP or 
> MAX_HEAP, now debugging is much easier and understanding the HNSW code is 
> much more intuitive



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to