[
https://issues.apache.org/jira/browse/LUCENE-5809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14056852#comment-14056852
]
Michael McCandless commented on LUCENE-5809:
--------------------------------------------
This patch makes ExactPhraseScorer MUCH simpler, I like it.
I tested on wikimedium (~1 KB sized docs):
{noformat}
Report after iter 19:
Task QPS base StdDev QPS comp StdDev
Pct diff
OrHighNotMed 35.53 (14.2%) 33.51 (17.4%)
-5.7% ( -32% - 30%)
HighPhrase 4.24 (11.9%) 4.01 (14.0%)
-5.4% ( -27% - 23%)
HighSloppyPhrase 3.38 (13.7%) 3.24 (15.1%)
-4.3% ( -29% - 28%)
MedPhrase 187.91 (16.4%) 180.25 (14.5%)
-4.1% ( -30% - 32%)
LowSloppyPhrase 41.14 (16.1%) 39.58 (17.9%)
-3.8% ( -32% - 36%)
LowPhrase 13.00 (8.1%) 12.63 (15.1%)
-2.8% ( -24% - 22%)
HighSpanNear 8.89 (18.6%) 8.67 (23.6%)
-2.5% ( -37% - 48%)
MedSpanNear 30.90 (13.7%) 30.16 (18.1%)
-2.4% ( -30% - 34%)
OrHighMed 30.97 (14.8%) 30.24 (17.0%)
-2.4% ( -29% - 34%)
LowTerm 312.88 (18.0%) 306.15 (19.7%)
-2.1% ( -33% - 43%)
Fuzzy2 40.30 (13.6%) 39.61 (15.8%)
-1.7% ( -27% - 32%)
MedTerm 102.62 (16.5%) 101.02 (19.4%)
-1.6% ( -32% - 41%)
OrNotHighMed 22.54 (14.2%) 22.20 (15.0%)
-1.5% ( -26% - 32%)
Fuzzy1 53.81 (14.3%) 53.00 (15.0%)
-1.5% ( -26% - 32%)
OrNotHighHigh 10.62 (12.9%) 10.57 (11.2%)
-0.5% ( -21% - 27%)
HighTerm 66.04 (24.3%) 65.94 (20.8%)
-0.1% ( -36% - 59%)
IntNRQ 3.09 (19.1%) 3.08 (16.8%)
-0.1% ( -30% - 44%)
Prefix3 84.61 (12.9%) 84.52 (14.2%)
-0.1% ( -24% - 31%)
MedSloppyPhrase 3.22 (16.4%) 3.23 (17.5%)
0.0% ( -29% - 40%)
OrHighLow 22.09 (16.4%) 22.21 (13.1%)
0.5% ( -24% - 35%)
LowSpanNear 10.10 (17.7%) 10.20 (16.8%)
1.0% ( -28% - 43%)
AndHighMed 32.50 (12.6%) 32.92 (10.9%)
1.3% ( -19% - 28%)
Respell 44.07 (13.4%) 44.85 (14.4%)
1.8% ( -22% - 34%)
OrHighNotHigh 13.10 (12.4%) 13.42 (11.5%)
2.4% ( -19% - 30%)
OrHighNotLow 27.70 (18.9%) 28.42 (18.3%)
2.6% ( -29% - 49%)
AndHighLow 335.76 (17.4%) 344.72 (16.4%)
2.7% ( -26% - 44%)
AndHighHigh 26.54 (12.7%) 27.48 (10.2%)
3.5% ( -17% - 30%)
OrNotHighLow 22.53 (19.2%) 23.36 (14.5%)
3.7% ( -25% - 46%)
OrHighHigh 9.62 (14.2%) 10.04 (11.1%)
4.3% ( -18% - 34%)
Wildcard 17.96 (18.3%) 18.92 (13.4%)
5.3% ( -22% - 45%)
{noformat}
And also on wikibig (= full sized docs, averge is ~4 KB):
{noformat}
Report after iter 19:
Task QPS base StdDev QPS comp StdDev
Pct diff
AndHighHigh 418.84 (11.5%) 401.17 (16.9%)
-4.2% ( -29% - 27%)
OrNotHighHigh 98.86 (12.4%) 95.04 (14.1%)
-3.9% ( -27% - 25%)
Respell 69.21 (11.5%) 66.53 (15.2%)
-3.9% ( -27% - 25%)
LowTerm 1338.88 (9.5%) 1288.80 (9.8%)
-3.7% ( -21% - 17%)
AndHighMed 104.48 (5.9%) 100.86 (11.8%)
-3.5% ( -19% - 15%)
OrHighNotLow 200.80 (14.6%) 193.85 (17.7%)
-3.5% ( -31% - 33%)
MedSloppyPhrase 5.44 (12.0%) 5.25 (15.6%)
-3.4% ( -27% - 27%)
HighTerm 154.05 (23.3%) 148.81 (18.5%)
-3.4% ( -36% - 49%)
OrHighNotMed 181.16 (15.7%) 175.41 (16.4%)
-3.2% ( -30% - 34%)
OrHighHigh 141.36 (12.1%) 137.07 (14.7%)
-3.0% ( -26% - 27%)
Fuzzy2 135.77 (14.3%) 131.98 (12.0%)
-2.8% ( -25% - 27%)
Prefix3 34.23 (15.5%) 33.28 (20.7%)
-2.8% ( -33% - 39%)
Wildcard 129.89 (14.4%) 126.29 (17.7%)
-2.8% ( -30% - 34%)
MedPhrase 7.74 (15.8%) 7.54 (17.2%)
-2.5% ( -30% - 36%)
OrNotHighMed 58.84 (16.3%) 57.47 (13.0%)
-2.3% ( -27% - 32%)
MedTerm 350.57 (18.1%) 344.41 (14.9%)
-1.8% ( -29% - 38%)
Fuzzy1 101.74 (13.1%) 100.00 (14.5%)
-1.7% ( -25% - 29%)
OrHighNotHigh 44.20 (12.9%) 43.45 (13.9%)
-1.7% ( -25% - 28%)
HighSloppyPhrase 16.81 (16.2%) 16.63 (16.3%)
-1.1% ( -28% - 37%)
OrHighLow 135.97 (17.9%) 134.71 (17.0%)
-0.9% ( -30% - 41%)
HighPhrase 23.69 (11.9%) 23.59 (13.9%)
-0.4% ( -23% - 28%)
IntNRQ 24.97 (18.5%) 24.88 (21.4%)
-0.4% ( -33% - 48%)
LowSloppyPhrase 5863.22 (12.3%) 5867.79 (12.0%)
0.1% ( -21% - 27%)
OrNotHighLow 183.47 (14.1%) 184.00 (11.9%)
0.3% ( -22% - 30%)
MedSpanNear 4.94 (11.3%) 4.97 (9.5%)
0.4% ( -18% - 23%)
OrHighMed 99.96 (15.6%) 100.52 (15.5%)
0.6% ( -26% - 37%)
LowSpanNear 40.66 (11.2%) 41.05 (13.0%)
1.0% ( -20% - 28%)
LowPhrase 135.03 (12.9%) 136.67 (16.5%)
1.2% ( -25% - 35%)
HighSpanNear 14.62 (12.5%) 14.85 (11.7%)
1.6% ( -20% - 29%)
AndHighLow 587.30 (15.4%) 602.79 (12.0%)
2.6% ( -21% - 35%)
{noformat}
> Simplify ExactPhraseScorer
> --------------------------
>
> Key: LUCENE-5809
> URL: https://issues.apache.org/jira/browse/LUCENE-5809
> Project: Lucene - Core
> Issue Type: Task
> Components: core/search
> Reporter: Robert Muir
> Attachments: LUCENE-5809.patch
>
>
> While looking at this scorer i see a few little things which are remnants of
> the past:
> * crazy heuristics to use next() over advance(): I think it should just use
> advance(), like conjunctionscorer. these days advance() isnt stupid anymore
> * incorrect leapfrogging. the lead scorer is never advanced if a subsequent
> scorer goes past it, it just falls into this nextDoc() loop.
> * pre-next()'ing: we are using cost() api to sort, so there is no need to do
> that.
> * UnionDocsAndPositionsEnum doesnt follow docsenum contract and set initial
> doc to -1
> * postingsreader advance() doesnt need to check docFreq > BLOCK_SIZE on each
> advance call, thats easy to remove.
> So I think really this scorer should just look like "conjunctionscorer that
> verifies positions on match".
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]