Re: Sort-based skipping and concurrency

Michael Froh Thu, 15 Jan 2026 10:22:26 -0800

For what it's worth, I believe for OpenSearch we took the approach of
disabling concurrent segment search when a request has a sort that will
likely benefit significantly from pruning. (Either we're sorting on a field
that matches the index sort or we're sorting by the special @timestamp
field.) Basically, we assume that the skipped work from pruning
non-competitive segments (and front-loading the most-likely competitive
segments) outweighs the latency gain from concurrency.


On Thu, Jan 15, 2026 at 7:21 AM Alan Woodward <[email protected]> wrote:

> Hi Adrien!
>
> That makes sense, I’ll see if I can put something together for
> TopFieldCollectorManager.
>
> On 15 Jan 2026, at 13:47, Adrien Grand <[email protected]> wrote:
>
> Hi Alan!
>
> It's exciting that you're looking into this, it's long overdue! I suspect
> that fully merging skipping by score or field is not going to work as they
> behave very differently (skipping by score skips on the postings of the
> very field that is used by the Query, while skipping by field needs to
> introduce a new conjunctive clause (the competitive iterator) that works by
> looking at the index structures of the field that is used for sorting) and
> that it'd be easier to introduce sharing of the competitive value
> thresholds in a somewhat independent way.
>
> On Thu, Jan 15, 2026 at 1:33 PM Alan Woodward <[email protected]>
> wrote:
>
>> Hi all,
>>
>> I've been working on improving Sort skipping by using segment
>> re-ordering[1] for sorted segments, which on benchmarking was working
>> brilliantly on the wikimedium10k dataset, but then showing no improvements
>> at all on wikimediumall  - very confusing!  On digging further, I found the
>> issue to be with search concurrency.  The segment re-ordering happens
>> within a single unit of work, but for sufficiently large segments we
>> generate large numbers of tasks even if the concurrency is low - e.g., with
>> an ExecutorService with 2 threads, we still get 17 tasks.
>>
>> For score-based searches we share competitive scores between tasks using
>> a MaxScoreAccumulator, but we don’t have anything similar for
>> field-sort-based searches.  So these 17 tasks all run without any knowledge
>> of what has happened before, and even if we re-ordered the tasks so that
>> the first ones contained the best hits, the subsequent tasks wouldn’t be
>> able to early terminate.
>>
>> I can look into adding some support for sharing competitive value
>> thresholds between threads, but I wonder if it’s worth considering merging
>> the skipping infrastructure between score-based sorting and field-based
>> sorting?
>>
>> - Alan
>>
>> [1] https://github.com/apache/lucene/pull/15436
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
>
> --
> Adrien
>
>
>

Re: Sort-based skipping and concurrency

Reply via email to