mayya-sharipova edited a comment on issue #1351: LUCENE-9280: Collectors to 
skip noncompetitive documents
URL: https://github.com/apache/lucene-solr/pull/1351#issuecomment-604173071
 
 
   I have run some benchmarking using `luceneutil`.
   As the new sort optimization uses a new `LongDocValuesPointSortField` that 
is not present in `luceneutil`, I had to hack `luceneutil` as follows:
   
   1. I added a  sort task on a long field `TermDateTimeSort`  to 
`wikimedium.1M.nostopwords.tasks` . This task was present in 
`wikinightly.tasks` , but was not able for wikimedium 1M and 10M tasks
   2. I indexed the corresponding field `lastModNDV` as `LongPoint` as well. It 
was only indexed as `NumericDocValuesField` before, but for the sort 
optimization we need long values to be indexed both as docValues and as points.
   3. I modified `SearchTask.java` to have `TopFieldCollector` with 
`totalHitsThreshold` set to `topK`: `final TopFieldCollector c = 
TopFieldCollector.create(s, topN, null, topN);`   Sort optimization only works 
when we set total hits threshold.
   4. For the patch version , I modified sort in `TaskParser.java`. Instead of 
`lastModNDVSort = new Sort(new SortField("lastModNDV", SortField.Type.LONG));`  
I useed the optimized sort: `lastModNDVSort = new Sort(new 
LongDocValuesPointSortField("lastModNDV"));`
   
   Here the main point of comparison is `TermDTSort` as it is the only sort on 
long field. Other sorts are presented to demonstrate a possible regression or 
absence on them.
   
   ---
   wikimedium1m
   | TaskQPS               | baseline QPS | StdDevQPS | my_modified_version QPS 
| StdDevQPS |
   | --------------------- | -----------: | --------: | ----------------------: 
| --------: |
   | **TermDTSort**        |       507.20 |   (11.2%) |                  550.02 
|   (16.1%) |
   | HighTermMonthSort     |       550.06 |   (10.4%) |                  443.69 
|   (16.1%) |
   | HighTermDayOfYearSort |       105.62 |   (24.9%) |                   91.93 
|   (22.1%) |
   ---
   wikimedium10m
   | TaskQPS               | baseline QPS | StdDevQPS | my_modified_version QPS 
| StdDevQPS |
   | --------------------- | -----------: | --------: | ----------------------: 
| --------: |
   | **TermDTSort**        |       147.64 |   (11.5%) |                  547.80 
|    (6.6%) |
   | HighTermMonthSort     |       147.85 |   (12.2%) |                  239.28 
|    (7.3%) |
   | HighTermDayOfYearSort |        74.44 |    (7.7%) |                   42.56 
|   (12.1%) |
   
   For wikimedium1m  TermDTSort using `LongDocValuesPointSortField` doesn't 
seem to have much effect. As probably in this index segments are smaller, and 
probably optimization was completely skipped on those segments.
   For wikimedium10m TermDTSort using `LongDocValuesPointSortField`  instead of 
usual `SortField.Type.LONG` **brings about 3x speedups**.
   There is some regression/speedups for the sort tasks of HighTermMonthSort 
and HighTermDayOfYearSort, which I don't know the reason why, as they should 
not be effected. 
   
   
   
   
   
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to