atris commented on issue #15612:
URL: https://github.com/apache/lucene/issues/15612#issuecomment-3824970598

   
   > 
   > > we need to watch out for centroid drift where the HNSW node stops 
effectively representing the new combined cluster.
   > 
   > Right, I believe we'll have to adjust based on participating postings? 
Merging to the largest ensures we only need to reassign vectors from the 
smaller postings. With a new centroid, we'll have to run reassignment for 
vectors across all postings in the cluster, to maintain the NPA property. It's 
a tradeoff b/w re-using structures from existing segments v/s rebuilding them 
entirely. If postings in the cluster are far away and similarly sized, it'll 
likely be more optimal to create a new centroid.
   
   
   Yeah, reusing the centroid saves compute but risks graph quality if the 
cluster shape shifts significantly.
   
   Given Ben's point about batch building, we probably just eat the cost of 
full reassignment to keep the graph healthy.
   
   I'll start with the full rebuild for simplicity. If merge latency kills us, 
we can optimize with the "largest centroid" heuristic later.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to