[
https://issues.apache.org/jira/browse/SOLR-12890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658512#comment-16658512
]
mosh commented on SOLR-12890:
-----------------------------
{quote}Is this different from just committed SOLR-12879?{quote}
The main difference is that MinHash can only be calculated for strings, while
this use case is used for other hashes.
This POC is for indexing vectors, while SOLR-12879 is for comparing string by
analysing their vector values.
The URP in this POC takes a vector string(either dense or sparse) e.g.
0.11,0.22,0.5,0.72,4.66 ...
and calculates its LSH hash at index time (using superbit for now, but this can
be extended in the future).
Perhaps the query parsers could be joined or have some kind of factory check
the field type to pick the right query,
but I do not think the URP can be replaced at this time.
> Vector Search in Solr (Umbrella Issue)
> --------------------------------------
>
> Key: SOLR-12890
> URL: https://issues.apache.org/jira/browse/SOLR-12890
> Project: Solr
> Issue Type: New Feature
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: mosh
> Priority: Major
>
> We have recently come across a need to index documents containing vectors
> using solr, and have even worked on a small POC. We used an URP to calculate
> the LSH(we chose to use the superbit algorithm, but the code is designed in a
> way the algorithm picked can be easily chagned), and stored the vector in
> either sparse or dense forms, in a binary field.
> Perhaps an addition of an LSH URP in conjunction with a query parser that
> uses the same properties to calculate LSH(or maybe ktree, or some other
> algorithm all together) should be considered as a Solr feature?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]