jimczi commented on issue #12497:
URL: https://github.com/apache/lucene/issues/12497#issuecomment-1670739188

   > I am not sure we can create the HNSW graph until all vectors are 
quantized. Some experimentation will have to be done here. It may be that 
creating the graph in a streaming fashion and then quantizing the vectors later 
works fine.
   
   That appears to be a sound approach. Generally, I believe we can separate 
the process of constructing the graph from the graph traversal itself. As an 
example, take DiskANN, which constructs the graph based on the original vectors 
and subsequently conducts graph traversal for search using quantized vectors 
via Product Quantization (PQ). This approach seems reasonable because the graph 
exclusively holds neighbors, enhancing precision due to its origination from 
the original vectors. During search, it might be suitable to employ an 
alternate strategy for computing similarity.
   
   The primary challenge I foresee is if we were to apply distinct quantization 
boundaries for each segment. Merging the similarities resulting from these 
varied quantizations could be intricate, given their operation on differing 
scales. Perhaps we could mitigate these disparities by implementing rescoring 
using the original vectors at a segment level? That might be too costly though. 
Another possibility would be to compute the boundaries regularly but not on 
every segment/merge or using half-float instead of bytes so that we can apply 
the same reduction for all segments?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to