mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip 
noncompetitive documents
URL: https://github.com/apache/lucene-solr/pull/1351#issuecomment-605327672
 
 
   @msokolov Thank for suggesting additional benchmarks that we can use.
   Below are the results on the dataset `wikimedium10m`.
   
   First I will repeat the results from the previous round of benchmarking:
   
   topN=10, taskRepeatCount = 20, concurrentSearchers = False
   
   | TaskQPS               | baseline QPS | StdDevQPS | my_modified_version QPS 
| StdDevQPS |
   | --------------------- | -----------: | --------: | ----------------------: 
| --------: |
   | **TermDTSort**        |       147.64 |   (11.5%) |                  547.80 
|    (6.6%) |
   | HighTermMonthSort     |       147.85 |   (12.2%) |                  239.28 
|    (7.3%) |
   | HighTermDayOfYearSort |        74.44 |    (7.7%) |                   42.56 
|   (12.1%) |
   
   
   
   ---
     topN=10, **taskRepeatCount = 500**, concurrentSearchers = False
   | TaskQPS               | baseline QPS | StdDevQPS | my_modified_version QPS 
| StdDevQPS |
   | --------------------- | -----------: | --------: | ----------------------: 
| --------: |
   | **TermDTSort**        |       184.60 |    (8.2%) |                 3046.19 
|    (4.4%) |
   | HighTermMonthSort     |       209.43 |    (6.5%) |                  253.90 
|   (10.5%) |
   | HighTermDayOfYearSort |       130.97 |    (5.8%) |                   73.25 
|   (11.8%) |
   
   This seemed to speed up all operations, and here the speedups for 
`TermDTSort` even bigger: 16.5x times. There is also seems to be more 
regression for `HighTermDayOfYearSort`.
   
   ---
     **topN=500**,  taskRepeatCount = 20, concurrentSearchers = False
   
   
   | TaskQPS               | baseline QPS | StdDevQPS | my_modified_version QPS 
| StdDevQPS |
   | --------------------- | -----------: | --------: | ----------------------: 
| --------: |
   | **TermDTSort**        |       210.24 |    (9.7%) |                  537.65 
|    (6.7%) |
   | HighTermMonthSort     |       116.02 |    (8.9%) |                  189.96 
|   (13.5%) |
   | HighTermDayOfYearSort |        42.33 |    (7.6%) |                   67.93 
|    (9.3%) |
   
   With increased `topN` the sort optimization has less speedups up to 2x, as 
it is expected as it will be possible to run it only after collecting `topN` 
docs.
   
   ---
   topN=10, taskRepeatCount = 20, **concurrentSearchers = True**
   | TaskQPS               | baseline QPS | StdDevQPS | my_modified_version QPS 
| StdDevQPS |
   | --------------------- | -----------: | --------: | ----------------------: 
| --------: |
   | **TermDTSort**        |       132.09 |   (14.3%) |                  287.93 
|   (11.8%) |
   | HighTermMonthSort     |       211.01 |   (12.2%) |                  116.46 
|    (7.1%) |
   | HighTermDayOfYearSort |        72.28 |    (6.1%) |                   68.21 
|   (11.4%) |
   
   With the concurrent searchers the speedups are also smaller up to 2x. This 
is expected as now segments are spread between several 
TopFieldCollects/Comparators and they don't exchange bottom values.  As a 
follow-up on this PR, we can think how we can have a global bottom value 
similar how `MaxScoreAccumulator` is used to set up a global competitive min 
score. 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to