[ 
https://issues.apache.org/jira/browse/SOLR-12890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16948960#comment-16948960
 ] 

Michael Sokolov commented on SOLR-12890:
----------------------------------------

Thank you for opening this issue! I have a POC I've been working on for 
approximate KNN search based on Hierarchical Navigable Small-world graphs. 
There are a couple of papers with Malkov as lead author that I'm following. The 
solution I haveĀ  is early stages; rough outline; two new DocValues types - one 
for vectors (wrapper around BinaryDocValues adding the model of a fixed 
dimension as a FieldInfo attribute), and another for the graph that is based on 
SortedNumericDocValues, but with special merge support.I build a separate graph 
for each segment, and then recreate a new graph when merging. I haven't made 
any Solr-level integration - just the field changes and implementing the KNN 
search algorithm as a standalone class for testing. Still needs a Lucene Query 
implementation, but that should be relatively straightforward on top of this 
low level support.

I was hesitating to open an issue since I'm about to go away for a few weeks, 
but I can open a branch and share what I have. It has many nocommits, some 
fundamental design issues still to be resolved (I had to hack in changes to 
DefaultIndexingChain and would rather do something more principled), but it 
does work and is not terribly slow (a few hundred QPS on my laptop at recall 
around 95%) for uniform random vectors. I haven't yet been able to test on real 
data from embeddings.

> Vector Search in Solr (Umbrella Issue)
> --------------------------------------
>
>                 Key: SOLR-12890
>                 URL: https://issues.apache.org/jira/browse/SOLR-12890
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: mosh
>            Priority: Major
>
> We have recently come across a need to index documents containing vectors 
> using solr, and have even worked on a small POC. We used an URP to calculate 
> the LSH(we chose to use the superbit algorithm, but the code is designed in a 
> way the algorithm picked can be easily chagned), and stored the vector in 
> either sparse or dense forms, in a binary field.
> Perhaps an addition of an LSH URP in conjunction with a query parser that 
> uses the same properties to calculate LSH(or maybe ktree, or some other 
> algorithm all together) should be considered as a Solr feature?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to