atris commented on issue #15612:
URL: https://github.com/apache/lucene/issues/15612#issuecomment-3825003504

   > So, the merging and splitting of centroids should fit in nicely with graph 
updates. 
   > 
   > - Merge vectors according to their centroids nearest centroid. Possibly 
first finding a candidate number of centroids given the vector centroids 
(centroid to centroid search). Then refine by the vectors and append to the 
correct centroid
   > - Then if the centroid gets too large, split it. 
   > - On split, you can use the initial centroid's graph connections as a 
bootstrapped set of neighbors to search in the graph and update its connections.
   > - Rinse and repeat
   > 
   > That is likely way faster than doing this one vector at a time and just 
attempting to add to the graph directly.
   
   Bootstrapping from the parent avoids the global entry search, so definitely 
faster.
   
   The main thing to watch is that the split nodes don't inherit identical 
neighbor sets, or the graph navigation degrades (we lose the small-world 
shortcuts). A quick local beam search starting from those parent candidates 
should solve that.
   
   I'll target that Centroid-first flow for the merge logic.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to