[ 
https://issues.apache.org/jira/browse/LUCENE-10147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17424724#comment-17424724
 ] 

Julie Tibshirani commented on LUCENE-10147:
-------------------------------------------

Maybe something non-elegant but clear likeĀ {{convertToScore}}? And ooof, I am 
pretty bad at reading javadocs. But maybe a user would miss this too. I agree 
adding a note on {{VectorSimilarityFunction}} itself would be good, and we 
could include more context on how it's meant to be used (as a fast way to 
perform cosine similarity, not for general "maximum inner product search"!)

> KnnVectorQuery can produce negative scores
> ------------------------------------------
>
>                 Key: LUCENE-10147
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10147
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Julie Tibshirani
>            Priority: Blocker
>
> The cosine similarity of two vectors falls in the range [-1, 1]. So currently 
> with cosine similarity, {{KnnVectorQuery}} can produce negative scores. Maybe 
> we should just adjust the scores in this case by adding 1, shifting them to 
> the range [0, 2].
> As a side note, this made me notice that 
> {{VectorSimilarityFunction.DOT_PRODUCT}} is really quite "expert"! Users need 
> to know to normalize all document and query vectors to unit length when using 
> this similarity. Otherwise the output is unbounded and difficult to handle in 
> scoring. Also dot product is not a true metric: for example, it doesn't obey 
> the triangle inequality. So many ANN algorithms have trouble supporting it. 
> As part of this issue, we could improve the documentation on 
> {{VectorSimilarityFunction.DOT_PRODUCT}} to clarify that normalization is 
> required.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to