javanna opened a new issue, #13791:
URL: https://github.com/apache/lucene/issues/13791

   We recently made a change to `TotalHitCountCollectorManager`, to make it 
take the leaf slices in its constructor, so that we can automatically decide 
what type of collectors are needed. That is because there is additional 
handling needed in case the search targets segment partitions, which causes a 
bit of overhead and we want to be able to avoid that overhead when not 
necessary.
   
   There is a somewhat similar mechanism in the `TopScoreDocCollectorManager` 
and `TopFieldCollectorManager` constructors that take a `boolean` as last 
argument. That flag tells the manager if it has to create collectors that 
support concurrency or not. That does the job, but it can lead to errors in 
that users may not know exactly what to pass in and use the wrong value, which 
lead them to getting non deterministic results if they provide `false` and 
their searcher has an executor set to it. The other way around, there is a bit 
of overhead caused by the need for concurrency that perhaps can be 
automatically avoided when the search targets a single slice. As far as I can 
see the concurrent path is currently the default so far, and we mostly use the 
flag to disable concurrency when we are sure there is not executor set.
   
   I wonder if we should replace that boolean argument with the array of leaf 
slices instead. That is not perfect but better than a boolean, I think. It is 
consistent with the current solution for `TotalHitCountCollectorManager`, and 
it allows the manager to make an accurate decision with lower likelihood of 
making a mistake (you may take slices from the wrong searcher instance perhaps 
if you use multiple searchers with different settings, but that's about it?). 
There are two main implied question in this:
   1. should we use LeafSlice[] as an argument to automatically detect if 
concurrency needs to be supported or not?
   2. while we are at it, should we also make that argument mandatory so that 
we may even avoid overhead caused by concurrency when the search targets a 
single slice, despite an executor has been set to the searcher (more adaptive 
to what we do today)?
   
   I would like to collect opinions on this, I think it could be an additional 
small breaking change to make in Lucene 10 if there is appeal for it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to