[ https://issues.apache.org/jira/browse/LUCENE-10593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17542677#comment-17542677 ]
Alessandro Benedetti commented on LUCENE-10593: ----------------------------------------------- https://github.com/apache/lucene/pull/926 has been opened, [~sokolov], [~mayya], [~julietibs] [~jpountz] feel free to review > VectorSimilarityFunction reverse removal > ---------------------------------------- > > Key: LUCENE-10593 > URL: https://issues.apache.org/jira/browse/LUCENE-10593 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Alessandro Benedetti > Priority: Major > Labels: vector-based-search > > org.apache.lucene.index.VectorSimilarityFunction#EUCLIDEAN similarity behaves > in an opposite way in comparison to the other similarities: > A higher similarity score means higher distance, for this reason, has been > marked with "reversed" and a function is present to map from the similarity > to a score (where higher means closer, like in all other similarities.) > Having this counterintuitive behavior with no apparent explanation I could > find(please correct me if I am wrong) brings a lot of nasty side effects for > the code readability, especially when combined with the NeighbourQueue that > has a "reversed" itself. > In addition, it complicates also the usage of the pattern: > Result Queue -> MIN HEAP > Candidate Queue -> MAX HEAP > In HNSW searchers. > The proposal in my Pull Request aims to: > 1) the Euclidean similarity just returns the score, in line with the other > similarities, with the formula currently used to move from distance to score > 2) simplify the code, removing the bound checker that's not necessary anymore > 3) refactor here and there to be in line with the simplification > 4) refactor of NeighborQueue to clearly state when it's a MIN_HEAP or > MAX_HEAP, now debugging is much easier and understanding the HNSW code is > much more intuitive -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org