[
https://issues.apache.org/jira/browse/LUCENE-6179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Adrien Grand updated LUCENE-6179:
---------------------------------
Attachment: bool_or.tasks
LUCENE-6179.patch
Here is a patch. It removes out-of-order scoring support and makes
BooleanScorer (aka BS1) score documents in order. The new version is slower
than current trunk
{code}
TaskQPS baseline StdDev QPS patch StdDev
Pct diff
OrLowMed 93.19 (5.9%) 78.33 (4.8%)
-16.0% ( -25% - -5%)
OrLowLow 282.29 (5.0%) 237.59 (4.1%)
-15.8% ( -23% - -7%)
OrLowLowLow 203.94 (7.6%) 172.27 (4.8%)
-15.5% ( -25% - -3%)
OrMedMed 57.48 (6.4%) 49.40 (5.3%)
-14.1% ( -24% - -2%)
OrLowHigh 15.42 (6.4%) 13.63 (5.4%)
-11.6% ( -21% - 0%)
OrMedMedMed 37.51 (6.3%) 33.71 (5.6%)
-10.1% ( -20% - 1%)
OrMedHigh 18.73 (6.0%) 16.89 (5.4%)
-9.8% ( -20% - 1%)
OrHighHigh 10.94 (5.8%) 10.19 (5.7%)
-6.8% ( -17% - 5%)
PKLookup 259.62 (0.9%) 258.13 (1.0%)
-0.6% ( -2% - 1%)
OrHighHighHigh 7.85 (5.4%) 7.86 (6.6%)
0.1% ( -11% - 12%)
LowTerm 687.79 (3.5%) 694.17 (3.2%)
0.9% ( -5% - 7%)
HighTerm 22.81 (5.6%) 23.11 (1.3%)
1.3% ( -5% - 8%)
MedTerm 154.48 (5.2%) 162.79 (2.6%)
5.4% ( -2% - 13%)
{code}
but still much faster than the Scorer-based BulkScorer:
{code}
TaskQPS baseline StdDev QPS patch StdDev
Pct diff
LowTerm 671.14 (5.0%) 665.62 (3.2%)
-0.8% ( -8% - 7%)
OrLowLow 232.81 (6.2%) 231.60 (3.8%)
-0.5% ( -9% - 10%)
PKLookup 254.28 (1.2%) 253.31 (0.9%)
-0.4% ( -2% - 1%)
HighTerm 19.47 (3.0%) 20.31 (1.1%)
4.3% ( 0% - 8%)
MedTerm 149.88 (4.7%) 162.49 (1.8%)
8.4% ( 1% - 15%)
OrLowLowLow 151.55 (5.2%) 165.27 (5.9%)
9.1% ( -2% - 21%)
OrLowMed 67.87 (6.2%) 80.68 (5.6%)
18.9% ( 6% - 32%)
OrLowHigh 7.97 (5.8%) 10.36 (7.7%)
29.9% ( 15% - 46%)
OrMedMed 35.97 (5.1%) 48.45 (7.5%)
34.7% ( 20% - 49%)
OrMedHigh 12.44 (5.6%) 17.01 (8.0%)
36.7% ( 21% - 53%)
OrHighHigh 5.38 (5.2%) 8.16 (10.2%)
51.7% ( 34% - 70%)
OrMedMedMed 21.63 (4.8%) 34.07 (8.8%)
57.5% ( 41% - 74%)
OrHighHighHigh 3.96 (4.5%) 6.78 (12.0%)
71.2% ( 52% - 91%)
{code}
The new BooleanScorer uses a bit set in order to sort documents that matched
within the window. As Robert pointed out, I don't have CTZ/NTZ support on my
JVM, so this run of the benchmark is actually a worst-case. It should perform
better on a newer machine:
{code}
$ java -XX:+PrintFlagsFinal -version | grep Instruction
intx FenceInstruction = 0
{ARCH product}
bool UseBMI1Instructions = false
{ARCH product}
bool UseCountLeadingZerosInstruction = false
{ARCH product}
bool UseCountTrailingZerosInstruction = false
{ARCH product}
bool UsePopCountInstruction = true
{product}
java version "1.8.0_25"
Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)
{code}
Some final notes about the patch:
- it helps remove significant code: {{97 files changed, 411 insertions(+),
2131 deletions(-)}}
- since Solr always forces in-order scoring, it would actually make Solr
faster since it would now use BS1
- some queries that were lazy to implement scoresDocsOutOfOrder correctly
might be faster because we won't pick a collector that supports out-of-order
scoring while documents are actually collected in order (see eg. FilteredQuery)
If someone is interested in reproducing the benchmark, I uploaded the tasks
file that I used.
> Remove out-of-order scoring
> ---------------------------
>
> Key: LUCENE-6179
> URL: https://issues.apache.org/jira/browse/LUCENE-6179
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Assignee: Adrien Grand
> Fix For: 5.0, Trunk
>
> Attachments: LUCENE-6179.patch, bool_or.tasks
>
>
> Out-of-order currently adds complexity that I would like to remove. Here is a
> selection of issues that come from out-of-order scoring.
> - lots of specializations with collectors: we have two versions of every top
> score/field collector depending on whether it should support out-of-order
> collection or not
> - it feels like it should be an implementation detail of our bulk scorers
> but it also makes our APIs more complicated, eg.
> LeafCollector.acceptsDocsOutOfOrder
> - if you create a TopFieldCollector, how do you know if you should pass
> docsScoredInOrder=true or false? To make the decision, you actually need to
> know whether your query supports out-of-order scoring while the API is on
> Weight.
> I initially wanted to keep it and improve the decision process in LUCENE-6172
> but I'm not sure it's the right approach as it would require to make the API
> even more complicated... hence the suggestion to remove out-of-order scoring
> completely.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]