[
https://issues.apache.org/jira/browse/LUCENE-5702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Adrien Grand updated LUCENE-5702:
---------------------------------
Attachment: SortBench.java
LUCENE-5702.patch
Updated patch to current trunk. I also did some benchmarking and the removal of
the one-comparator specialization had a bad impact on performance so I added it
back, we could discuss the over-specialization of top-field collectors in a
different issue...
You can find attached the (dummy) benchmark that I used to check the
performance impact of this patch. Times are in milliseconds (the smaller the
better).
|| sort || trunk || patch || difference ||
| long asc | 100 | 108 | +8% |
| long desc | 101 | 110 | +9% |
| double asc | 107 | 114 | +7% |
| double desc | 113 | 118 | +4% |
| string asc | 119 | 123 | +3% |
| string desc | 120 | 124 | +3% |
| long asc, double asc | 98 | 87 | -11% |
| long desc, double desc | 102 | 89 | -13% |
Some cases are slightly faster, others are slightly slower. This benchmark only
runs a sort to find the top 50 hits on a {{MatchAllDocsQuery}}, so differences
would be even smaller if you run an actual query and/or have other collectors
(eg. if you also want to compute facets).
This patch is **only** about API. It just splits FieldComparator into
* FieldComparator:
** compare(int slot1, int slot2)
** void setTopValue(T value)
** T value(int slot)
** LeafFieldComparator getLeafComparator(LeafReaderContext context)
* and LeafFieldComparator:
** int compareBottom(int doc)
** int compareTop(int doc)
** void copy(int slot, int doc)
** void setScorer(Scorer scorer)
All the logic about top-field collection is left unchanged. So there is still a
single top-level priority queue that all leaf collectors update. I think
changing the API is important for several reasons:
* it makes the FieldComparator API aligned with the Collector API
(LeafCollector <-> LeafFieldComparator)
* it makes the workflow easier to understand: you need to get a
LeafFieldComparator before you can call setScorer
* Even if the patch does not contain any optimization, it would make
per-segment optimizations easier. For instance, if all documents in a segment
have the same value, we could ignore this sort field in comparisons. Or if an
index has a single segment, we could decide to only use ordinals for
comparisons and avoid copying values on each competitive hit.
> Per-segment comparator API
> --------------------------
>
> Key: LUCENE-5702
> URL: https://issues.apache.org/jira/browse/LUCENE-5702
> Project: Lucene - Core
> Issue Type: New Feature
> Reporter: Adrien Grand
> Assignee: Adrien Grand
> Priority: Minor
> Fix For: Trunk
>
> Attachments: LUCENE-5702.patch, LUCENE-5702.patch, SortBench.java
>
>
> As a next step of LUCENE-5527, it would be nice to have per-segment
> comparators, and maybe even change the default behavior of our top*
> comparators so that they merge top hits in the end.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]