vigyasharma commented on issue #15612: URL: https://github.com/apache/lucene/issues/15612#issuecomment-3824933968
> Running the baselines on the single-segment flush now (using the PR branch). @atris Awesome, looking forward to benchmarking numbers. > we need to watch out for centroid drift where the HNSW node stops effectively representing the new combined cluster. Right, I believe we'll have to adjust based on participating postings? Merging to the largest ensures we only need to reassign vectors from the smaller postings. With a new centroid, we'll have to run reassignment for vectors across all postings in the cluster, to maintain the NPA property. It's a tradeoff b/w re-using structures from existing segments v/s rebuilding them entirely. If postings in the cluster are far away and similarly sized, it'll likely be more optimal to create a new centroid. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
