Hi Alan! It's exciting that you're looking into this, it's long overdue! I suspect that fully merging skipping by score or field is not going to work as they behave very differently (skipping by score skips on the postings of the very field that is used by the Query, while skipping by field needs to introduce a new conjunctive clause (the competitive iterator) that works by looking at the index structures of the field that is used for sorting) and that it'd be easier to introduce sharing of the competitive value thresholds in a somewhat independent way.
On Thu, Jan 15, 2026 at 1:33 PM Alan Woodward <[email protected]> wrote: > Hi all, > > I've been working on improving Sort skipping by using segment > re-ordering[1] for sorted segments, which on benchmarking was working > brilliantly on the wikimedium10k dataset, but then showing no improvements > at all on wikimediumall - very confusing! On digging further, I found the > issue to be with search concurrency. The segment re-ordering happens > within a single unit of work, but for sufficiently large segments we > generate large numbers of tasks even if the concurrency is low - e.g., with > an ExecutorService with 2 threads, we still get 17 tasks. > > For score-based searches we share competitive scores between tasks using a > MaxScoreAccumulator, but we don’t have anything similar for > field-sort-based searches. So these 17 tasks all run without any knowledge > of what has happened before, and even if we re-ordered the tasks so that > the first ones contained the best hits, the subsequent tasks wouldn’t be > able to early terminate. > > I can look into adding some support for sharing competitive value > thresholds between threads, but I wonder if it’s worth considering merging > the skipping infrastructure between score-based sorting and field-based > sorting? > > - Alan > > [1] https://github.com/apache/lucene/pull/15436 > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > -- Adrien
