benwtrent commented on PR #15676: URL: https://github.com/apache/lucene/pull/15676#issuecomment-3877384930
FYI, I suspect any collaborative search across shards to have an impact on recall with the same parameters (unless finely tuned). They key thing is the visited/recall curve. Can we get the same recall with fewer visited? I suspect real-world lucene indices (just like Lucene segments), to be a random sample of the entire corpus. Relevant vectors should be expected to be evenly distributed between all indices. This is the assumption that lucene makes with segments and its "optimistic search" pattern. This same assumption will be required by this idea. Anything searching one shard and then only sets competitiveness without taking this into account will be useless. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
