[
https://issues.apache.org/jira/browse/LUCENE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17431160#comment-17431160
]
Robert Muir commented on LUCENE-10191:
--------------------------------------
do we really need these slower functions? IMO the dot product is already slow
enough in java!
Being a lower-level library, and having to support backwards compatibility for
a long time, I'd like us to consider keeping this stuff to a minimum.
Precomputing stuff to support these functions seems like the wrong direction to
me, I think they should be removed, and users should just use the dot product.
> Optimize vector functions by precomputing magnitudes
> ----------------------------------------------------
>
> Key: LUCENE-10191
> URL: https://issues.apache.org/jira/browse/LUCENE-10191
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Julie Tibshirani
> Priority: Minor
>
> Both euclidean distance (L2 norm) and cosine similarity can be expressed in
> terms of dot product and vector magnitudes:
> * l2_norm(a, b) = ||a - b|| = sqrt(||a||^2 - 2(a . b) + ||b||^2)
> * cosine(a, b) = a . b / ||a|| ||b||
> We could compute and store each vector's magnitude upfront while indexing,
> and compute the query vector's magnitude once per query. Then we'd calculate
> the distance using our (very optimized) dot product method, plus the
> precomputed values.
> This is an exploratory issue: I haven't tested this out yet, so I'm not sure
> how much it would help. I would at least expect it to help with cosine
> similarity – several months ago we tried out similar ideas in Elasticsearch
> and were able to get a nice boost in cosine performance.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]