gf2121 opened a new pull request, #13221: URL: https://github.com/apache/lucene/pull/13221
This PR proposes a new way to do numeric dynamic pruning with following changes: * Instead of complex sampling and estimating point count to judge whether to build the competitive iterator, this patch proposes to find out the threshold value. That said, we find out the value that 'N docs away from' the top value, in favor of the [the fact that top value should be final in LeafComparators], like the following picture shows. (https://github.com/apache/lucene/blob/99b9636fd8c383c80d06c8815cfdb49b1b77dcdb/lucene/core/src/java/org/apache/lucene/search/FieldComparator.java#L75).  * Instead of building and rebuilding the competitive iterator when bottom value get more competitive, this patch proposes to build the competitive iterator as a disjunction of small DISIs. Each small DISI maintains its most competitive value and get discarded when their most competitive value is no more competitive, like what we did in `TermOrdValComparator`. This helps us intersect the tree only once and update the competitive iterator more frequently. * For simplification, i tweaked the bytes things to comparable long values. e.g. `maxValueAsBytes` -> `maxValueAsLong`. #### Benchmark Here is a result based on wikimedium10m (baseline contains https://github.com/apache/lucene/pull/13199) ``` TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value TermDTSort 176.84 (5.0%) 303.31 (6.3%) 71.5% ( 57% - 87%) 0.000 HighTermDayOfYearSort 454.72 (3.2%) 791.09 (7.9%) 74.0% ( 60% - 87%) 0.000 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org