vigyasharma commented on issue #15612:
URL: https://github.com/apache/lucene/issues/15612#issuecomment-3824933968

   > Running the baselines on the single-segment flush now (using the PR 
branch).
   
   @atris Awesome, looking forward to benchmarking numbers.
   
   > we need to watch out for centroid drift where the HNSW node stops 
effectively representing the new combined cluster.
   
   Right, I believe we'll have to adjust based on participating postings? 
Merging to the largest ensures we only need to reassign vectors from the 
smaller postings. With a new centroid, we'll have to run reassignment for 
vectors across all postings in the cluster, to maintain the NPA property. It's 
a tradeoff b/w re-using structures from existing segments v/s rebuilding them 
entirely. If postings in the cluster are far away and similarly sized, it'll 
likely be more optimal to create a new centroid.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to