[
https://issues.apache.org/jira/browse/LUCENE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385408#comment-17385408
]
Julie Tibshirani commented on LUCENE-10015:
-------------------------------------------
I think it makes sense to keep the ability to configure the similarity function
at the field level. I don't see it as a very 'expert' option -- based on what
the vectors represent and how they've been processed, it's necessary to use the
right similarity function to obtain good results. Also unlike 'maxConn' and
'beamWidth' (which were specific to our HNSW implementation), it's a concept
that makes sense across NN algorithms generally. To the best of my knowledge,
many NN algorithms can handle the full set of common similarity functions
(Euclidean, dot product, cosine).
In case it's helpful context: currently we only support Euclidean and cosine
distance, which is technically redundant. For cosine similarity, users could
normalize the vectors to unit length and use Euclidean. But I'm assuming we'll
add support for inner product too, which seems very popular and cannot be
expressed in terms of Euclidean distance. The FAISS library currently supports
only Euclidean distance and inner product.
> Remove VectorValues.SimilarityFunction, remove NONE
> ---------------------------------------------------
>
> Key: LUCENE-10015
> URL: https://issues.apache.org/jira/browse/LUCENE-10015
> Project: Lucene - Core
> Issue Type: Task
> Reporter: Robert Muir
> Priority: Blocker
> Fix For: 9.0
>
> Time Spent: 40m
> Remaining Estimate: 0h
>
> This stuff is HNSW-implementation specific. It can be moved to a codec
> parameter.
> The NONE option should be removed: it just makes the codec more complex.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]