benwtrent commented on PR #15676:
URL: https://github.com/apache/lucene/pull/15676#issuecomment-3927772689

   @krickert
   
   Your final numbers don't indicate recall. Please, we need to see what the 
Pareto frontier (how recall changes with increasing efSearch) looks like for 
the following scenarios as they reflect optimal, baseline, and candidate:
   
    - All vectors within a single shard (this is baseline optimal)
    - Vectors independently searched between shards (baseline multi-index/shard)
    - Vectors searched with your collaborative search (candidate 
multi-index/shard).
   
   Last I saw from your benchmarks, at k:100 collaborative was much worse.
   
   In most real world data, each index/shard will have a random subset of the 
entire dataset, your tests should reflect this as well.
   
   The Pareto frontier should likely be "recall vs. total vectors compared". 
And that for the latter two benchmarks, they are done against the exact same 
graphs/indices as reindexing isn't necessary to test with the collaborative 
searcher.
   
   
   I don't think benchmarking between machines is necessary. If the 
collaborative searching isn't useful when all shards are on the same machine 
(thus sharing information overhead is at its lowest), I doubt it will be 
helpful at all once overheads increase.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to