Hi all, I've been working on improving Sort skipping by using segment re-ordering[1] for sorted segments, which on benchmarking was working brilliantly on the wikimedium10k dataset, but then showing no improvements at all on wikimediumall - very confusing! On digging further, I found the issue to be with search concurrency. The segment re-ordering happens within a single unit of work, but for sufficiently large segments we generate large numbers of tasks even if the concurrency is low - e.g., with an ExecutorService with 2 threads, we still get 17 tasks.
For score-based searches we share competitive scores between tasks using a MaxScoreAccumulator, but we don’t have anything similar for field-sort-based searches. So these 17 tasks all run without any knowledge of what has happened before, and even if we re-ordered the tasks so that the first ones contained the best hits, the subsequent tasks wouldn’t be able to early terminate. I can look into adding some support for sharing competitive value thresholds between threads, but I wonder if it’s worth considering merging the skipping infrastructure between score-based sorting and field-based sorting? - Alan [1] https://github.com/apache/lucene/pull/15436 --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
