[ 
https://issues.apache.org/jira/browse/SOLR-12890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16948964#comment-16948964
 ] 

Trey Grainger commented on SOLR-12890:
--------------------------------------

[~softwaredoug] - Yeah, agreed that we should ultimately support multiple 
approaches. The question in my mind is what to bite off first. Since this is an 
umbrella issue, would be good to come up with a bigger-picture vision here and 
then break down into subtasks for the critical pieces.

I'm currently of the opinion that tackling the "vector scoring" piece first, 
like Elastic did, makes the most sense, as the approaches to quantizing vectors 
into terms like you've done in Hangry ([https://github.com/o19s]), through LSH 
like [~moshebla] did, or through some other technique, can be implemented as 
follow-on optimizations. Seems like if we add support for re-ranking by vector 
cosine first then at least we have a "slow" but highly accurate way to do this, 
and can add optimization from there.

It does seem like it would be optimal to integrate a nearest neighbor type 
filter that is easy to use as part of the field type / codec / qparser would be 
nice, but my current bias is toward knocking out basic vector scoring first and 
iterating to that as a next step. Most folks I've talked with are still going 
to want/need full vector similarity scoring for reranking regardless of any 
nearest neighbor optimizations.

Agree?

> Vector Search in Solr (Umbrella Issue)
> --------------------------------------
>
>                 Key: SOLR-12890
>                 URL: https://issues.apache.org/jira/browse/SOLR-12890
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: mosh
>            Priority: Major
>
> We have recently come across a need to index documents containing vectors 
> using solr, and have even worked on a small POC. We used an URP to calculate 
> the LSH(we chose to use the superbit algorithm, but the code is designed in a 
> way the algorithm picked can be easily chagned), and stored the vector in 
> either sparse or dense forms, in a binary field.
> Perhaps an addition of an LSH URP in conjunction with a query parser that 
> uses the same properties to calculate LSH(or maybe ktree, or some other 
> algorithm all together) should be considered as a Solr feature?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to