[
https://issues.apache.org/jira/browse/SOLR-12890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16948964#comment-16948964
]
Trey Grainger commented on SOLR-12890:
--------------------------------------
[~softwaredoug] - Yeah, agreed that we should ultimately support multiple
approaches. The question in my mind is what to bite off first. Since this is an
umbrella issue, would be good to come up with a bigger-picture vision here and
then break down into subtasks for the critical pieces.
I'm currently of the opinion that tackling the "vector scoring" piece first,
like Elastic did, makes the most sense, as the approaches to quantizing vectors
into terms like you've done in Hangry ([https://github.com/o19s]), through LSH
like [~moshebla] did, or through some other technique, can be implemented as
follow-on optimizations. Seems like if we add support for re-ranking by vector
cosine first then at least we have a "slow" but highly accurate way to do this,
and can add optimization from there.
It does seem like it would be optimal to integrate a nearest neighbor type
filter that is easy to use as part of the field type / codec / qparser would be
nice, but my current bias is toward knocking out basic vector scoring first and
iterating to that as a next step. Most folks I've talked with are still going
to want/need full vector similarity scoring for reranking regardless of any
nearest neighbor optimizations.
Agree?
> Vector Search in Solr (Umbrella Issue)
> --------------------------------------
>
> Key: SOLR-12890
> URL: https://issues.apache.org/jira/browse/SOLR-12890
> Project: Solr
> Issue Type: New Feature
> Reporter: mosh
> Priority: Major
>
> We have recently come across a need to index documents containing vectors
> using solr, and have even worked on a small POC. We used an URP to calculate
> the LSH(we chose to use the superbit algorithm, but the code is designed in a
> way the algorithm picked can be easily chagned), and stored the vector in
> either sparse or dense forms, in a binary field.
> Perhaps an addition of an LSH URP in conjunction with a query parser that
> uses the same properties to calculate LSH(or maybe ktree, or some other
> algorithm all together) should be considered as a Solr feature?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]