For what it's worth, I believe for OpenSearch we took the approach of disabling concurrent segment search when a request has a sort that will likely benefit significantly from pruning. (Either we're sorting on a field that matches the index sort or we're sorting by the special @timestamp field.) Basically, we assume that the skipped work from pruning non-competitive segments (and front-loading the most-likely competitive segments) outweighs the latency gain from concurrency.
On Thu, Jan 15, 2026 at 7:21 AM Alan Woodward <[email protected]> wrote: > Hi Adrien! > > That makes sense, I’ll see if I can put something together for > TopFieldCollectorManager. > > On 15 Jan 2026, at 13:47, Adrien Grand <[email protected]> wrote: > > Hi Alan! > > It's exciting that you're looking into this, it's long overdue! I suspect > that fully merging skipping by score or field is not going to work as they > behave very differently (skipping by score skips on the postings of the > very field that is used by the Query, while skipping by field needs to > introduce a new conjunctive clause (the competitive iterator) that works by > looking at the index structures of the field that is used for sorting) and > that it'd be easier to introduce sharing of the competitive value > thresholds in a somewhat independent way. > > On Thu, Jan 15, 2026 at 1:33 PM Alan Woodward <[email protected]> > wrote: > >> Hi all, >> >> I've been working on improving Sort skipping by using segment >> re-ordering[1] for sorted segments, which on benchmarking was working >> brilliantly on the wikimedium10k dataset, but then showing no improvements >> at all on wikimediumall - very confusing! On digging further, I found the >> issue to be with search concurrency. The segment re-ordering happens >> within a single unit of work, but for sufficiently large segments we >> generate large numbers of tasks even if the concurrency is low - e.g., with >> an ExecutorService with 2 threads, we still get 17 tasks. >> >> For score-based searches we share competitive scores between tasks using >> a MaxScoreAccumulator, but we don’t have anything similar for >> field-sort-based searches. So these 17 tasks all run without any knowledge >> of what has happened before, and even if we re-ordered the tasks so that >> the first ones contained the best hits, the subsequent tasks wouldn’t be >> able to early terminate. >> >> I can look into adding some support for sharing competitive value >> thresholds between threads, but I wonder if it’s worth considering merging >> the skipping infrastructure between score-based sorting and field-based >> sorting? >> >> - Alan >> >> [1] https://github.com/apache/lucene/pull/15436 >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> >> > > -- > Adrien > > >
