[ 
https://issues.apache.org/jira/browse/LUCENE-10147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17431170#comment-17431170
 ] 

Michael Sokolov commented on LUCENE-10147:
------------------------------------------

[~julietibs]I don't have a link to the thread, but IIRC we decided it would be 
too costly, since we would have to compute the length of every indexed vector. 
Indeed if we did that, we could simply impose the normalization. It could be an 
attractive option for some, butwe wanted to keep a small number of 
configuration knobs. Also, I expect most systems that infer such vectors could 
more efficiently handle this in an upstream system?

> KnnVectorQuery can produce negative scores
> ------------------------------------------
>
>                 Key: LUCENE-10147
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10147
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Julie Tibshirani
>            Priority: Blocker
>          Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The cosine similarity of two vectors falls in the range [-1, 1]. So currently 
> with cosine similarity, {{KnnVectorQuery}} can produce negative scores. Maybe 
> we should just adjust the scores in this case by adding 1, shifting them to 
> the range [0, 2].
> As a side note, this made me notice that 
> {{VectorSimilarityFunction.DOT_PRODUCT}} is really quite "expert"! Users need 
> to know to normalize all document and query vectors to unit length when using 
> this similarity. Otherwise the output is unbounded and difficult to handle in 
> scoring. Also dot product is not a true metric: for example, it doesn't obey 
> the triangle inequality. So many ANN algorithms have trouble supporting it. 
> As part of this issue, we could improve the documentation on 
> {{VectorSimilarityFunction.DOT_PRODUCT}} to clarify that normalization is 
> required.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to