[
https://issues.apache.org/jira/browse/SOLR-12890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16948960#comment-16948960
]
Michael Sokolov commented on SOLR-12890:
----------------------------------------
Thank you for opening this issue! I have a POC I've been working on for
approximate KNN search based on Hierarchical Navigable Small-world graphs.
There are a couple of papers with Malkov as lead author that I'm following. The
solution I haveĀ is early stages; rough outline; two new DocValues types - one
for vectors (wrapper around BinaryDocValues adding the model of a fixed
dimension as a FieldInfo attribute), and another for the graph that is based on
SortedNumericDocValues, but with special merge support.I build a separate graph
for each segment, and then recreate a new graph when merging. I haven't made
any Solr-level integration - just the field changes and implementing the KNN
search algorithm as a standalone class for testing. Still needs a Lucene Query
implementation, but that should be relatively straightforward on top of this
low level support.
I was hesitating to open an issue since I'm about to go away for a few weeks,
but I can open a branch and share what I have. It has many nocommits, some
fundamental design issues still to be resolved (I had to hack in changes to
DefaultIndexingChain and would rather do something more principled), but it
does work and is not terribly slow (a few hundred QPS on my laptop at recall
around 95%) for uniform random vectors. I haven't yet been able to test on real
data from embeddings.
> Vector Search in Solr (Umbrella Issue)
> --------------------------------------
>
> Key: SOLR-12890
> URL: https://issues.apache.org/jira/browse/SOLR-12890
> Project: Solr
> Issue Type: New Feature
> Reporter: mosh
> Priority: Major
>
> We have recently come across a need to index documents containing vectors
> using solr, and have even worked on a small POC. We used an URP to calculate
> the LSH(we chose to use the superbit algorithm, but the code is designed in a
> way the algorithm picked can be easily chagned), and stored the vector in
> either sparse or dense forms, in a binary field.
> Perhaps an addition of an LSH URP in conjunction with a query parser that
> uses the same properties to calculate LSH(or maybe ktree, or some other
> algorithm all together) should be considered as a Solr feature?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]