atris commented on issue #15612: URL: https://github.com/apache/lucene/issues/15612#issuecomment-3825003504
> So, the merging and splitting of centroids should fit in nicely with graph updates. > > - Merge vectors according to their centroids nearest centroid. Possibly first finding a candidate number of centroids given the vector centroids (centroid to centroid search). Then refine by the vectors and append to the correct centroid > - Then if the centroid gets too large, split it. > - On split, you can use the initial centroid's graph connections as a bootstrapped set of neighbors to search in the graph and update its connections. > - Rinse and repeat > > That is likely way faster than doing this one vector at a time and just attempting to add to the graph directly. Bootstrapping from the parent avoids the global entry search, so definitely faster. The main thing to watch is that the split nodes don't inherit identical neighbor sets, or the graph navigation degrades (we lose the small-world shortcuts). A quick local beam search starting from those parent candidates should solve that. I'll target that Centroid-first flow for the merge logic. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
