atris commented on issue #15612:
URL: https://github.com/apache/lucene/issues/15612#issuecomment-3824686268

   @benwtrent @vigyasharma
   
   Running the baselines on the single-segment flush now (using the PR branch).
   
   One thought while those churn: Ben is spot on about the quantization—we 
definitely need int8 (or 4-bit) for the posting lists to keep I/O in check. The 
trade-off is that once we quantize, we pretty much force a re-ranking phase 
unless we accept a lower recall ceiling. I'm keeping it raw floats in the PR 
for now just to establish the graph structure, but we can drop a decoder in the 
SpannVectorsReader pretty easily later.
   
   Also, regarding the "Largest Centroid" merge idea: it’s smart for speed, but 
we need to watch out for centroid drift where the HNSW node stops effectively 
representing the new combined cluster. We might need a lightweight re-centering 
step during merge even if we don't fully re-cluster.
   
   Anyway, let's see what the baseline numbers say first.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to