[
https://issues.apache.org/jira/browse/LUCENE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17431166#comment-17431166
]
Robert Muir commented on LUCENE-10191:
--------------------------------------
Also, talking about storing stuff differently makes it obvious, that these
slower functions should go.
Instead, slower functions needing different representation should really be
different codecs. We can still reuse code, but it allows us to e.g. support
different functions without signing up for backwards compatibility. Otherwise,
I'm personally gonna feel the need to pushback every single time on all these
functions, because I think we've already attempted to sign up for too much. And
trying to support these functions the way it happens now is wrong to do and
will lead to hairballs.
{{VectorSimilarityFunction}} must be removed, and support for this stuff placed
in lucene/codecs
> Optimize vector functions by precomputing magnitudes
> ----------------------------------------------------
>
> Key: LUCENE-10191
> URL: https://issues.apache.org/jira/browse/LUCENE-10191
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Julie Tibshirani
> Priority: Minor
>
> Both euclidean distance (L2 norm) and cosine similarity can be expressed in
> terms of dot product and vector magnitudes:
> * l2_norm(a, b) = ||a - b|| = sqrt(||a||^2 - 2(a . b) + ||b||^2)
> * cosine(a, b) = a . b / ||a|| ||b||
> We could compute and store each vector's magnitude upfront while indexing,
> and compute the query vector's magnitude once per query. Then we'd calculate
> the distance using our (very optimized) dot product method, plus the
> precomputed values.
> This is an exploratory issue: I haven't tested this out yet, so I'm not sure
> how much it would help. I would at least expect it to help with cosine
> similarity – several months ago we tried out similar ideas in Elasticsearch
> and were able to get a nice boost in cosine performance.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]