Hi all,

I've been working on improving Sort skipping by using segment re-ordering[1] 
for sorted segments, which on benchmarking was working brilliantly on the 
wikimedium10k dataset, but then showing no improvements at all on wikimediumall 
 - very confusing!  On digging further, I found the issue to be with search 
concurrency.  The segment re-ordering happens within a single unit of work, but 
for sufficiently large segments we generate large numbers of tasks even if the 
concurrency is low - e.g., with an ExecutorService with 2 threads, we still get 
17 tasks.

For score-based searches we share competitive scores between tasks using a 
MaxScoreAccumulator, but we don’t have anything similar for field-sort-based 
searches.  So these 17 tasks all run without any knowledge of what has happened 
before, and even if we re-ordered the tasks so that the first ones contained 
the best hits, the subsequent tasks wouldn’t be able to early terminate.

I can look into adding some support for sharing competitive value thresholds 
between threads, but I wonder if it’s worth considering merging the skipping 
infrastructure between score-based sorting and field-based sorting?

- Alan

[1] https://github.com/apache/lucene/pull/15436
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to