[ https://issues.apache.org/jira/browse/SOLR-12890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17078818#comment-17078818 ]
Trey Grainger commented on SOLR-12890: -------------------------------------- After reviewing and testing the code in the patch generously contributed on this issue (thank you [~moshebla]!) and subsequently thinking through the design a lot, I believe there are several limitations to the approach in this current code. Specifically, the use of terms as dimensions in the vector with attached payload is pretty inefficient and won't work well at scale and the use of a query parser is less flexible and reusable than a function query/value source approach would be (in terms of more flexible combination with other functions and use in sorting, returned fields, etc.). Additionally, I think an optimal design would allow for multi-valued vectors (multiple vectors in a field) in order to support things like word embeddings, sentence embeddings, paragraph embeddings, etc., as opposed to only one vector per field in each document, which is challenging to implement with the current approach. Instead of hijacking this Jira and replacing the previous work and design, I've created a new Jira (SOLR-14397) and submitted a new proposed design there, which I plan to work on as next iteration of this Vector Search in Solr initiative. If you're following along with this effort, I'd encourage you to check out SOLR-14397 and provide any feedback on the updated design proposed there. Thanks! > Vector Search in Solr (Umbrella Issue) > -------------------------------------- > > Key: SOLR-12890 > URL: https://issues.apache.org/jira/browse/SOLR-12890 > Project: Solr > Issue Type: New Feature > Reporter: mosh > Priority: Major > > We have recently come across a need to index documents containing vectors > using solr, and have even worked on a small POC. We used an URP to calculate > the LSH(we chose to use the superbit algorithm, but the code is designed in a > way the algorithm picked can be easily chagned), and stored the vector in > either sparse or dense forms, in a binary field. > Perhaps an addition of an LSH URP in conjunction with a query parser that > uses the same properties to calculate LSH(or maybe ktree, or some other > algorithm all together) should be considered as a Solr feature? -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org