Hi Adrien!

That makes sense, I’ll see if I can put something together for 
TopFieldCollectorManager.

> On 15 Jan 2026, at 13:47, Adrien Grand <[email protected]> wrote:
> 
> Hi Alan!
> 
> It's exciting that you're looking into this, it's long overdue! I suspect 
> that fully merging skipping by score or field is not going to work as they 
> behave very differently (skipping by score skips on the postings of the very 
> field that is used by the Query, while skipping by field needs to introduce a 
> new conjunctive clause (the competitive iterator) that works by looking at 
> the index structures of the field that is used for sorting) and that it'd be 
> easier to introduce sharing of the competitive value thresholds in a somewhat 
> independent way.
> 
> On Thu, Jan 15, 2026 at 1:33 PM Alan Woodward <[email protected] 
> <mailto:[email protected]>> wrote:
>> Hi all,
>> 
>> I've been working on improving Sort skipping by using segment re-ordering[1] 
>> for sorted segments, which on benchmarking was working brilliantly on the 
>> wikimedium10k dataset, but then showing no improvements at all on 
>> wikimediumall  - very confusing!  On digging further, I found the issue to 
>> be with search concurrency.  The segment re-ordering happens within a single 
>> unit of work, but for sufficiently large segments we generate large numbers 
>> of tasks even if the concurrency is low - e.g., with an ExecutorService with 
>> 2 threads, we still get 17 tasks.
>> 
>> For score-based searches we share competitive scores between tasks using a 
>> MaxScoreAccumulator, but we don’t have anything similar for field-sort-based 
>> searches.  So these 17 tasks all run without any knowledge of what has 
>> happened before, and even if we re-ordered the tasks so that the first ones 
>> contained the best hits, the subsequent tasks wouldn’t be able to early 
>> terminate.
>> 
>> I can look into adding some support for sharing competitive value thresholds 
>> between threads, but I wonder if it’s worth considering merging the skipping 
>> infrastructure between score-based sorting and field-based sorting?
>> 
>> - Alan
>> 
>> [1] https://github.com/apache/lucene/pull/15436
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected] 
>> <mailto:[email protected]>
>> For additional commands, e-mail: [email protected] 
>> <mailto:[email protected]>
>> 
> 
> 
> 
> --
> Adrien

Reply via email to