mayya-sharipova commented on issue #1351: LUCENE-9280: Collectors to skip 
noncompetitive documents
URL: https://github.com/apache/lucene-solr/pull/1351#issuecomment-604173071
 
 
   I have run some benchmarking using `luceneutil`.
   As the new sort optimization uses a new `LongDocValuesPointSortField` that 
is not present in `luceneutil`, I had to hack `luceneutil` as follows:
   
   1. I added a  sort task on a long field `TermDateTimeSort`  to 
`wikimedium.1M.nostopwords.tasks` . This task was present in 
`wikinightly.tasks` , but was not able for wikimedium 1M and 10M tasks
   2. I indexed the corresponding field `lastModNDV` as `LongPoint` as well. It 
was only indexed as `NumericDocValuesField` before, but for the sort 
optimization we need long values to be indexed both as docValues and as points.
   3. I modified `SearchTask.java` to have `TopFieldCollector` with 
`totalHitsThreshold` set to `topK`: `final TopFieldCollector c = 
TopFieldCollector.create(s, topN, null, topN);`   Sort optimization only works 
when we set total hits threshold.
   4. For the patch version , I modified sort in `TaskParser.java`. Instead of 
`lastModNDVSort = new Sort(new SortField("lastModNDV", SortField.Type.LONG));`  
I useed the optimized sort: `lastModNDVSort = new Sort(new 
LongDocValuesPointSortField("lastModNDV"));`
   
   Here the main point of comparison is `TermDTSort` as it is the only sort on 
long field. Other sorts are presented to demonstrate a possible regression or 
absence on them.
   
   ---
   wikimedium1m
   | TaskQPS               | baseline QPS | StdDevQPS | my_modified_version QPS 
| StdDevQPS |
   | --------------------- | -----------: | --------: | ----------------------: 
| --------: |
   | **TermDTSort**        |       507.20 |   (11.2%) |                  550.02 
|   (16.1%) |
   | HighTermMonthSort     |       550.06 |   (10.4%) |                  443.69 
|   (16.1%) |
   | HighTermDayOfYearSort |       105.62 |   (24.9%) |                   91.93 
|   (22.1%) |
   ---
   wikimedium10m
   | TaskQPS               | baseline QPS | StdDevQPS | my_modified_version QPS 
| StdDevQPS |
   | --------------------- | -----------: | --------: | ----------------------: 
| --------: |
   | **TermDTSort**        |       147.64 |   (11.5%) |                  547.80 
|    (6.6%) |
   | HighTermMonthSort     |       147.85 |   (12.2%) |                  239.28 
|    (7.3%) |
   | HighTermDayOfYearSort |        74.44 |    (7.7%) |                   42.56 
|   (12.1%) |
   
   For wikimedium1m using `LongDocValuesPointSortField` doesn't seem to have 
much effect. As probably in this index segments are smaller, and probably 
optimization was completely skipped on those segments.
   For wikimedium10m using `LongDocValuesPointSortField`  instead of usual 
`SortField.Type.LONG` **brings about 3x speedups**.
   There is so regression/speedups for the sort tasks of HighTermMonthSort and 
HighTermDayOfYearSort, which I don't know the reason why, as they should not be 
effected. 
   
   
   
   
   
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to