gf2121 opened a new pull request, #13221:
URL: https://github.com/apache/lucene/pull/13221

   This PR proposes a new way to do numeric dynamic pruning with following 
changes:
   
   * Instead of complex sampling and estimating point count to judge whether to 
build the competitive iterator, this patch proposes to find out the threshold 
value. That said, we find out the value that 'N docs away from' the top value, 
in favor of the [the fact that top value should be final in LeafComparators], 
like the following picture shows.
   
   
(https://github.com/apache/lucene/blob/99b9636fd8c383c80d06c8815cfdb49b1b77dcdb/lucene/core/src/java/org/apache/lucene/search/FieldComparator.java#L75).
 
   
   
![image](https://github.com/apache/lucene/assets/52390227/b7125122-194a-4167-a821-57c47479915d)
   
   * Instead of building and rebuilding the competitive iterator when bottom 
value get more competitive, this patch proposes to build the competitive 
iterator as a disjunction of small DISIs. Each small DISI maintains its most 
competitive value and get discarded when their most competitive value is no 
more competitive, like what we did in `TermOrdValComparator`. This helps us 
intersect the tree only once and update the competitive iterator more 
frequently.
   
   * For simplification, i tweaked the bytes things to comparable long values. 
e.g. `maxValueAsBytes` -> `maxValueAsLong`.
   
   #### Benchmark
   
   Here is a result based on wikimedium10m (baseline contains 
https://github.com/apache/lucene/pull/13199)
   
   ```
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
                         TermDTSort      176.84      (5.0%)      303.31      
(6.3%)   71.5% (  57% -   87%) 0.000
              HighTermDayOfYearSort      454.72      (3.2%)      791.09      
(7.9%)   74.0% (  60% -   87%) 0.000
   ```
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to