Re: [I] Should we explore DiskANN for aKNN vector search? [lucene]

via GitHub Fri, 01 Mar 2024 15:47:53 -0800


kevindrosendahl commented on issue #12615:
URL: https://github.com/apache/lucene/issues/12615#issuecomment-1974089752


   Think I agree with your points @benwtrent, will just jot down my thinking on 
HNSW vs Vamana vs DiskANN in case it's useful.
   
   HNSW and Vamana are "competing" proximity graphs, which differ mainly in the 
number of layers in the graph (`n` vs 1) and the pruning algorithm used. From a 
purely academic point of view I find Vamana more appealing due to its 
simplicity, namely not having to keep track of levels and there being a 1:1 
relationship between the number of nodes and the number of vectors being 
indexed vs an M:1. Practically speaking they provide roughly the same 
interface, so given we have a working HNSW graph and nothing compelling enough 
to replace it as of now, I'd agree there wouldn't be reason to.
   
   I think of DiskANN as the algorithm consisting of an initial ANN search 
using compressed vectors followed by a reranking phase on full fidelity 
vectors. There are a number of decisions that can be made for where to store 
the graph, compressed vectors, and full fidelity vectors. If you choose to 
store the full fidelity vectors in-line with the graph (as suggested by the 
original DiskANN paper), then Vamana may be more appealing than HNSW due to its 
1:1 node:vector relationship. However, the results above seem to show that this 
implementation didn't benefit much from placing vectors inline with the graph. 
Given all other benefits of storing vectors in a flat file in ordinal order 
(including the potential for asynchronous I/O) that would seem like the 
pragmatic choice, in which case you could pretty easily use an HNSW graph as 
the proximity graph for the DiskANN algorithm.
   
   @jmazanec15 the constants used were taken from JVector, whose 
performance/behavior I was initially trying to emulate in Lucene. I didn't 
spend much time fiddling with them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [I] Should we explore DiskANN for aKNN vector search? [lucene]

Reply via email to