[
https://issues.apache.org/jira/browse/LUCENE-6919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Adrien Grand updated LUCENE-6919:
---------------------------------
Attachment: LUCENE-6919.patch
Good point about Scorer.docId(). I think it's also better to not require
Collectors to go though the iterator since they are not supposed to move the
iterator. Once this change is in, maybe in the future we can expose a smaller
interface in Collector.setScorer (that only exposes a doc ID, a freq and a
score, as suggested in LUCENE-6228).
Here is a patch with the proposed changes. It makes some things slightly less
complicated due to the additional level of indirection between a Weight and a
DocIdSetIterator (eg. for delete-by-query) but it also makes some Scorer
implementations simpler since they can now just return the iterator instead of
reimplementing the whole DISI API and forwarding it to an existing
DocIdSetIterator (for instance this is what conjunctions did).
For consistency, Scorer.asTwoPhaseIterator has been renamed to
Scorer.twoPhaseIterator. So now Scorer has Scorer.iterator() which is a
required method and returns a DocIdSetIterator and Scorer.twoPhaseIterator
which is an optional method and returns a TwoPhaseIterator.
The luceneutil report is consistent with what I got with the hacky patch:
{noformat}
TaskQPS baseline StdDev QPS patch StdDev
Pct diff
OrNotHighLow 1119.21 (3.9%) 1078.59 (5.3%)
-3.6% ( -12% - 5%)
AndHighLow 796.76 (4.9%) 774.43 (5.6%)
-2.8% ( -12% - 8%)
MedSloppyPhrase 58.28 (4.0%) 57.67 (4.9%)
-1.0% ( -9% - 8%)
LowSpanNear 122.71 (2.0%) 121.65 (2.8%)
-0.9% ( -5% - 4%)
HighSpanNear 3.25 (2.3%) 3.22 (2.0%)
-0.8% ( -4% - 3%)
LowPhrase 113.56 (1.5%) 112.73 (2.1%)
-0.7% ( -4% - 2%)
MedSpanNear 109.63 (2.4%) 109.15 (1.7%)
-0.4% ( -4% - 3%)
HighSloppyPhrase 6.23 (5.0%) 6.22 (5.4%)
-0.2% ( -10% - 10%)
OrHighNotLow 102.42 (3.8%) 102.32 (3.5%)
-0.1% ( -7% - 7%)
LowSloppyPhrase 22.01 (2.3%) 22.01 (2.8%)
0.0% ( -4% - 5%)
MedPhrase 15.92 (1.6%) 15.94 (1.8%)
0.1% ( -3% - 3%)
HighPhrase 34.63 (3.1%) 34.75 (3.2%)
0.3% ( -5% - 6%)
OrNotHighMed 141.69 (3.3%) 142.75 (3.3%)
0.7% ( -5% - 7%)
OrNotHighHigh 50.74 (2.1%) 51.15 (2.7%)
0.8% ( -3% - 5%)
Respell 63.24 (3.2%) 64.05 (3.1%)
1.3% ( -4% - 7%)
OrHighNotHigh 42.37 (2.8%) 42.92 (3.0%)
1.3% ( -4% - 7%)
OrHighNotMed 80.74 (2.9%) 82.18 (2.9%)
1.8% ( -3% - 7%)
Prefix3 151.13 (4.7%) 155.37 (6.7%)
2.8% ( -8% - 14%)
AndHighHigh 36.96 (2.3%) 38.37 (2.3%)
3.8% ( 0% - 8%)
Fuzzy1 25.95 (5.9%) 27.00 (5.7%)
4.0% ( -7% - 16%)
OrHighMed 50.05 (5.0%) 52.10 (5.7%)
4.1% ( -6% - 15%)
OrHighHigh 33.64 (5.2%) 35.16 (4.7%)
4.5% ( -5% - 15%)
IntNRQ 10.93 (6.9%) 11.46 (6.2%)
4.8% ( -7% - 19%)
MedTerm 179.51 (3.8%) 188.22 (3.9%)
4.9% ( -2% - 13%)
OrHighLow 79.55 (2.9%) 83.56 (2.8%)
5.0% ( 0% - 11%)
LowTerm 682.13 (8.0%) 716.84 (6.4%)
5.1% ( -8% - 21%)
AndHighMed 114.21 (2.4%) 120.06 (2.4%)
5.1% ( 0% - 10%)
Wildcard 29.31 (6.4%) 31.07 (5.8%)
6.0% ( -5% - 19%)
HighTerm 118.05 (3.5%) 125.83 (4.4%)
6.6% ( -1% - 14%)
Fuzzy2 61.23 (20.9%) 67.19 (21.3%)
9.7% ( -26% - 65%)
{noformat}
> Change the Scorer API to expose an iterator instead of extending
> DocIdSetIterator
> ---------------------------------------------------------------------------------
>
> Key: LUCENE-6919
> URL: https://issues.apache.org/jira/browse/LUCENE-6919
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Assignee: Adrien Grand
> Priority: Minor
> Attachments: LUCENE-6919.patch, LUCENE-6919.patch
>
>
> I was working on trying to address the performance regression on LUCENE-6815
> but this is hard to do without introducing specialization of
> DisjunctionScorer which I'd like to avoid at all costs.
> I think the performance regression would be easy to address without
> specialization if Scorers were changed to return an iterator instead of
> extending DocIdSetIterator. So conceptually the API would move from
> {code}
> class Scorer extends DocIdSetIterator {
> }
> {code}
> to
> {code}
> class Scorer {
> DocIdSetIterator iterator();
> }
> {code}
> This would help me because then if none of the sub clauses support two-phase
> iteration, DisjunctionScorer could directly return the approximation as an
> iterator instead of having to check if twoPhase == null at every iteration.
> Such an approach could also help remove some method calls. For instance
> TermScorer.nextDoc calls PostingsEnum.nextDoc but with this change
> TermScorer.iterator() could return the PostingsEnum and TermScorer would not
> even appear in stack traces when scoring. I hacked a patch to see how much
> that would help and luceneutil seems to like the change:
> {noformat}
> TaskQPS baseline StdDev QPS patch StdDev
> Pct diff
> Fuzzy1 88.54 (15.7%) 86.73 (16.6%)
> -2.0% ( -29% - 35%)
> AndHighLow 698.98 (4.1%) 691.11 (5.1%)
> -1.1% ( -9% - 8%)
> Fuzzy2 26.47 (11.2%) 26.28 (10.3%)
> -0.7% ( -19% - 23%)
> MedSpanNear 141.03 (3.3%) 140.51 (3.2%)
> -0.4% ( -6% - 6%)
> HighPhrase 60.66 (2.6%) 60.48 (3.3%)
> -0.3% ( -5% - 5%)
> LowSpanNear 29.25 (2.4%) 29.21 (2.1%)
> -0.1% ( -4% - 4%)
> MedPhrase 28.32 (1.9%) 28.28 (2.0%)
> -0.1% ( -3% - 3%)
> LowPhrase 17.31 (2.1%) 17.29 (2.6%)
> -0.1% ( -4% - 4%)
> HighSloppyPhrase 10.93 (6.0%) 10.92 (6.0%)
> -0.1% ( -11% - 12%)
> MedSloppyPhrase 72.21 (2.2%) 72.27 (1.8%)
> 0.1% ( -3% - 4%)
> Respell 57.35 (3.2%) 57.41 (3.4%)
> 0.1% ( -6% - 6%)
> HighSpanNear 26.71 (3.0%) 26.75 (2.5%)
> 0.1% ( -5% - 5%)
> OrNotHighLow 803.46 (3.4%) 807.03 (4.2%)
> 0.4% ( -6% - 8%)
> LowSloppyPhrase 88.02 (3.4%) 88.77 (2.5%)
> 0.8% ( -4% - 7%)
> OrNotHighMed 200.45 (2.7%) 203.83 (2.5%)
> 1.7% ( -3% - 7%)
> OrHighHigh 38.98 (7.9%) 40.30 (6.6%)
> 3.4% ( -10% - 19%)
> HighTerm 92.53 (5.3%) 95.94 (5.8%)
> 3.7% ( -7% - 15%)
> OrHighMed 53.80 (7.7%) 55.79 (6.6%)
> 3.7% ( -9% - 19%)
> AndHighMed 266.69 (1.7%) 277.15 (2.5%)
> 3.9% ( 0% - 8%)
> Prefix3 44.68 (5.4%) 46.60 (7.0%)
> 4.3% ( -7% - 17%)
> MedTerm 261.52 (4.9%) 273.52 (5.4%)
> 4.6% ( -5% - 15%)
> Wildcard 42.39 (6.1%) 44.35 (7.8%)
> 4.6% ( -8% - 19%)
> IntNRQ 10.46 (7.0%) 10.99 (9.5%)
> 5.0% ( -10% - 23%)
> OrNotHighHigh 67.15 (4.6%) 70.65 (4.5%)
> 5.2% ( -3% - 15%)
> OrHighNotHigh 43.07 (5.1%) 45.36 (5.4%)
> 5.3% ( -4% - 16%)
> OrHighLow 64.19 (6.4%) 67.72 (5.5%)
> 5.5% ( -6% - 18%)
> AndHighHigh 64.17 (2.3%) 67.87 (2.1%)
> 5.8% ( 1% - 10%)
> LowTerm 642.94 (10.9%) 681.48 (8.5%)
> 6.0% ( -12% - 28%)
> OrHighNotMed 12.68 (6.9%) 13.51 (6.6%)
> 6.5% ( -6% - 21%)
> OrHighNotLow 54.69 (6.8%) 58.25 (7.0%)
> 6.5% ( -6% - 21%)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]