Re: [PR] Feature/collaborative hnsw search [lucene]

via GitHub Fri, 20 Feb 2026 13:15:10 -0800


benwtrent commented on PR #15676:
URL: https://github.com/apache/lucene/pull/15676#issuecomment-3937191855


   > Vastly increase index size (e.g. 10–20×) so the system is actually 
stressed; we expect much larger collaborative gains, consistent with the 
savings seen on higher-latency (2.5 Gbit) setups.
   
   I don't see how this will show anything different? What is the size of your 
data set now?
   
   I would assume about 1M per shard should be plenty to give any indication 
that this would prove useful.
   
   However, I am not sure a naive sharing like this will actually work without 
other orchestration (e.g. routing certain clusters of vectors to shards, which 
Lucene just won't do because lucene is the shard).
   
   I do think there is something to searching multiple graphs in parallel(e.g. 
Optimistic searching like Lucene does with segments). But this would have much 
more communication, orchestration, and work than simply sharing the 
min_competitive score.
   
   
   > and/or higher-latency setting would show a clearer separation
   
   How would that show any improvement? If sharing information doesn't help 
when the latency of communication is near zero, how would it improve when the 
latency of communication increases significantly? That just means now sharing 
(the key point of this algorithm) is now more expensive.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Feature/collaborative hnsw search [lucene]

Reply via email to