kevindrosendahl commented on issue #12615: URL: https://github.com/apache/lucene/issues/12615#issuecomment-1974089752
Think I agree with your points @benwtrent, will just jot down my thinking on HNSW vs Vamana vs DiskANN in case it's useful. HNSW and Vamana are "competing" proximity graphs, which differ mainly in the number of layers in the graph (`n` vs 1) and the pruning algorithm used. From a purely academic point of view I find Vamana more appealing due to its simplicity, namely not having to keep track of levels and there being a 1:1 relationship between the number of nodes and the number of vectors being indexed vs an M:1. Practically speaking they provide roughly the same interface, so given we have a working HNSW graph and nothing compelling enough to replace it as of now, I'd agree there wouldn't be reason to. I think of DiskANN as the algorithm consisting of an initial ANN search using compressed vectors followed by a reranking phase on full fidelity vectors. There are a number of decisions that can be made for where to store the graph, compressed vectors, and full fidelity vectors. If you choose to store the full fidelity vectors in-line with the graph (as suggested by the original DiskANN paper), then Vamana may be more appealing than HNSW due to its 1:1 node:vector relationship. However, the results above seem to show that this implementation didn't benefit much from placing vectors inline with the graph. Given all other benefits of storing vectors in a flat file in ordinal order (including the potential for asynchronous I/O) that would seem like the pragmatic choice, in which case you could pretty easily use an HNSW graph as the proximity graph for the DiskANN algorithm. @jmazanec15 the constants used were taken from JVector, whose performance/behavior I was initially trying to emulate in Lucene. I didn't spend much time fiddling with them. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org