[jira] [Commented] (SOLR-12890) Vector Search in Solr (Umbrella Issue)

Trey Grainger (Jira) Wed, 08 Apr 2020 16:49:21 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-12890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17078818#comment-17078818
 ]


Trey Grainger commented on SOLR-12890:
--------------------------------------

After reviewing and testing the code in the patch generously contributed on 
this issue (thank you [~moshebla]!) and subsequently thinking through the 
design a lot, I believe there are several limitations to the approach in this 
current code. Specifically, the use of terms as dimensions in the vector with 
attached payload is pretty inefficient and won't work well at scale and the use 
of a query parser is less flexible and reusable than a function query/value 
source approach would be (in terms of more flexible combination with other 
functions and use in sorting, returned fields, etc.). Additionally, I think an 
optimal design would allow for multi-valued vectors (multiple vectors in a 
field) in order to support things like word embeddings, sentence embeddings, 
paragraph embeddings, etc., as opposed to only one vector per field in each 
document, which is challenging to implement with the current approach.

Instead of hijacking this Jira and replacing the previous work and design, I've 
created a new Jira (SOLR-14397) and submitted a new proposed design there, 
which I plan to work on as next iteration of this Vector Search in Solr 
initiative.

If you're following along with this effort, I'd encourage you to check out 
SOLR-14397 and provide any feedback on the updated design proposed there. 
Thanks!

> Vector Search in Solr (Umbrella Issue)
> --------------------------------------
>
>                 Key: SOLR-12890
>                 URL: https://issues.apache.org/jira/browse/SOLR-12890
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: mosh
>            Priority: Major
>
> We have recently come across a need to index documents containing vectors 
> using solr, and have even worked on a small POC. We used an URP to calculate 
> the LSH(we chose to use the superbit algorithm, but the code is designed in a 
> way the algorithm picked can be easily chagned), and stored the vector in 
> either sparse or dense forms, in a binary field.
> Perhaps an addition of an LSH URP in conjunction with a query parser that 
> uses the same properties to calculate LSH(or maybe ktree, or some other 
> algorithm all together) should be considered as a Solr feature?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-12890) Vector Search in Solr (Umbrella Issue)

Reply via email to