benwtrent commented on PR #15676: URL: https://github.com/apache/lucene/pull/15676#issuecomment-3937191855
> Vastly increase index size (e.g. 10–20×) so the system is actually stressed; we expect much larger collaborative gains, consistent with the savings seen on higher-latency (2.5 Gbit) setups. I don't see how this will show anything different? What is the size of your data set now? I would assume about 1M per shard should be plenty to give any indication that this would prove useful. However, I am not sure a naive sharing like this will actually work without other orchestration (e.g. routing certain clusters of vectors to shards, which Lucene just won't do because lucene is the shard). I do think there is something to searching multiple graphs in parallel(e.g. Optimistic searching like Lucene does with segments). But this would have much more communication, orchestration, and work than simply sharing the min_competitive score. > and/or higher-latency setting would show a clearer separation How would that show any improvement? If sharing information doesn't help when the latency of communication is near zero, how would it improve when the latency of communication increases significantly? That just means now sharing (the key point of this algorithm) is now more expensive. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
