Adrien Grand created LUCENE-6919:
------------------------------------
Summary: Change the Scorer API to expose an iterator instead of
extending DocIdSetIterator
Key: LUCENE-6919
URL: https://issues.apache.org/jira/browse/LUCENE-6919
Project: Lucene - Core
Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
I was working on trying to address the performance regression on LUCENE-6815
but this is hard to do without introducing specialization of DisjunctionScorer
which I'd like to avoid at all costs.
I think the performance regression would be easy to address without
specialization if Scorers were changed to return an iterator instead of
extending DocIdSetIterator. So conceptually the API would move from
{code}
class Scorer extends DocIdSetIterator {
}
{code}
to
{code}
class Scorer {
DocIdSetIterator iterator();
}
{code}
This would help me because then if none of the sub clauses support two-phase
iteration, DisjunctionScorer could directly return the approximation as an
iterator instead of having to check if twoPhase == null at every iteration.
Such an approach could also help remove some method calls. For instance
TermScorer.nextDoc calls PostingsEnum.nextDoc but with this change
TermScorer.iterator() could return the PostingsEnum and TermScorer would not
even appear in stack traces when scoring. I hacked a patch to see how much that
would help and luceneutil seems to like the change:
{noformat}
TaskQPS baseline StdDev QPS patch StdDev
Pct diff
Fuzzy1 88.54 (15.7%) 86.73 (16.6%)
-2.0% ( -29% - 35%)
AndHighLow 698.98 (4.1%) 691.11 (5.1%)
-1.1% ( -9% - 8%)
Fuzzy2 26.47 (11.2%) 26.28 (10.3%)
-0.7% ( -19% - 23%)
MedSpanNear 141.03 (3.3%) 140.51 (3.2%)
-0.4% ( -6% - 6%)
HighPhrase 60.66 (2.6%) 60.48 (3.3%)
-0.3% ( -5% - 5%)
LowSpanNear 29.25 (2.4%) 29.21 (2.1%)
-0.1% ( -4% - 4%)
MedPhrase 28.32 (1.9%) 28.28 (2.0%)
-0.1% ( -3% - 3%)
LowPhrase 17.31 (2.1%) 17.29 (2.6%)
-0.1% ( -4% - 4%)
HighSloppyPhrase 10.93 (6.0%) 10.92 (6.0%)
-0.1% ( -11% - 12%)
MedSloppyPhrase 72.21 (2.2%) 72.27 (1.8%)
0.1% ( -3% - 4%)
Respell 57.35 (3.2%) 57.41 (3.4%)
0.1% ( -6% - 6%)
HighSpanNear 26.71 (3.0%) 26.75 (2.5%)
0.1% ( -5% - 5%)
OrNotHighLow 803.46 (3.4%) 807.03 (4.2%)
0.4% ( -6% - 8%)
LowSloppyPhrase 88.02 (3.4%) 88.77 (2.5%)
0.8% ( -4% - 7%)
OrNotHighMed 200.45 (2.7%) 203.83 (2.5%)
1.7% ( -3% - 7%)
OrHighHigh 38.98 (7.9%) 40.30 (6.6%)
3.4% ( -10% - 19%)
HighTerm 92.53 (5.3%) 95.94 (5.8%)
3.7% ( -7% - 15%)
OrHighMed 53.80 (7.7%) 55.79 (6.6%)
3.7% ( -9% - 19%)
AndHighMed 266.69 (1.7%) 277.15 (2.5%)
3.9% ( 0% - 8%)
Prefix3 44.68 (5.4%) 46.60 (7.0%)
4.3% ( -7% - 17%)
MedTerm 261.52 (4.9%) 273.52 (5.4%)
4.6% ( -5% - 15%)
Wildcard 42.39 (6.1%) 44.35 (7.8%)
4.6% ( -8% - 19%)
IntNRQ 10.46 (7.0%) 10.99 (9.5%)
5.0% ( -10% - 23%)
OrNotHighHigh 67.15 (4.6%) 70.65 (4.5%)
5.2% ( -3% - 15%)
OrHighNotHigh 43.07 (5.1%) 45.36 (5.4%)
5.3% ( -4% - 16%)
OrHighLow 64.19 (6.4%) 67.72 (5.5%)
5.5% ( -6% - 18%)
AndHighHigh 64.17 (2.3%) 67.87 (2.1%)
5.8% ( 1% - 10%)
LowTerm 642.94 (10.9%) 681.48 (8.5%)
6.0% ( -12% - 28%)
OrHighNotMed 12.68 (6.9%) 13.51 (6.6%)
6.5% ( -6% - 21%)
OrHighNotLow 54.69 (6.8%) 58.25 (7.0%)
6.5% ( -6% - 21%)
{noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]