vigyasharma commented on issue #15612:
URL: https://github.com/apache/lucene/issues/15612#issuecomment-3823350019

   Hi @Vikasht34, thanks for taking a look! I had considered DiskANN, but it 
has the same fundamental problems of graph based algos. You need to look at the 
entire graph to find nearest neighbors, which is memory intensive and has an 
expensive first fetch for cold nodes in storage/compute separated setups. It 
also comes with all the same problems in segment merge.
   
   We should consider PQ or other types of quantization independent of the 
algorithm. On that note, Better Binary Quantization (BBQ) seems to report 
better results than PQ? The postings can be quantized, and we can use PQ if 
high dimensionality becomes a n/w bottleneck. While posting lookup is brute 
force, the key is to have small postings that contain tail latency. We'll have 
to experiment and profile.
   
   I think there is space for these two families of vector search algos in 
Lucene. The graph based algos, where we should try optimizations like Hnsw + PQ 
(with disk access for full precision), and single layered Hnsw. And cluster 
based algos that help with highly selective filters, remote store setups and 
billion scale use-cases. Quantization applies across the board for both these 
approaches.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to