[
https://issues.apache.org/jira/browse/LUCENE-6260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Adrien Grand updated LUCENE-6260:
---------------------------------
Attachment: LUCENE-6260.patch
Here is a patch which makes phrase intersection essentially look like
ConjunctionDISI except that it works on positions instead of doc IDs. I ran
luceneutil on wikibig1M and the performance loss looks quite small:
{noformat}
TaskQPS baseline StdDev QPS patch StdDev
Pct diff
HighPhrase 33.54 (1.3%) 31.72 (1.9%)
-5.4% ( -8% - -2%)
LowPhrase 48.76 (1.2%) 47.74 (2.1%)
-2.1% ( -5% - 1%)
OrNotHighLow 1167.83 (4.0%) 1153.63 (4.4%)
-1.2% ( -9% - 7%)
Fuzzy1 112.76 (12.5%) 111.41 (11.9%)
-1.2% ( -22% - 26%)
MedPhrase 126.21 (1.6%) 124.89 (2.8%)
-1.0% ( -5% - 3%)
LowTerm 2361.80 (5.3%) 2338.19 (5.0%)
-1.0% ( -10% - 9%)
AndHighLow 1053.44 (2.6%) 1043.11 (5.6%)
-1.0% ( -8% - 7%)
OrHighNotMed 180.00 (1.8%) 179.10 (2.1%)
-0.5% ( -4% - 3%)
OrHighNotLow 139.58 (2.6%) 139.24 (3.1%)
-0.2% ( -5% - 5%)
IntNRQ 126.93 (6.3%) 126.72 (5.5%)
-0.2% ( -11% - 12%)
AndHighHigh 130.72 (3.1%) 130.58 (3.2%)
-0.1% ( -6% - 6%)
HighSpanNear 12.64 (1.2%) 12.63 (1.4%)
-0.1% ( -2% - 2%)
Prefix3 92.94 (7.8%) 92.92 (7.6%)
-0.0% ( -14% - 16%)
OrHighMed 155.49 (10.5%) 155.60 (10.0%)
0.1% ( -18% - 22%)
AndHighMed 181.53 (3.0%) 181.74 (3.0%)
0.1% ( -5% - 6%)
OrNotHighHigh 137.81 (3.1%) 137.98 (2.2%)
0.1% ( -5% - 5%)
OrHighHigh 136.52 (10.5%) 136.71 (9.8%)
0.1% ( -18% - 22%)
MedSloppyPhrase 44.59 (2.8%) 44.67 (3.3%)
0.2% ( -5% - 6%)
OrHighNotHigh 135.68 (1.6%) 135.93 (1.5%)
0.2% ( -2% - 3%)
MedTerm 949.94 (3.1%) 951.88 (2.9%)
0.2% ( -5% - 6%)
LowSpanNear 26.02 (0.9%) 26.07 (1.3%)
0.2% ( -1% - 2%)
OrHighLow 97.01 (11.1%) 97.22 (10.6%)
0.2% ( -19% - 24%)
MedSpanNear 27.98 (1.1%) 28.04 (1.0%)
0.2% ( -1% - 2%)
PKLookup 407.25 (2.2%) 408.17 (1.9%)
0.2% ( -3% - 4%)
OrNotHighMed 434.88 (2.8%) 435.99 (2.5%)
0.3% ( -4% - 5%)
Wildcard 166.20 (4.0%) 166.65 (4.6%)
0.3% ( -8% - 9%)
LowSloppyPhrase 107.31 (3.7%) 107.65 (3.9%)
0.3% ( -7% - 8%)
HighSloppyPhrase 13.76 (2.9%) 13.82 (2.9%)
0.4% ( -5% - 6%)
HighTerm 328.62 (2.3%) 330.24 (2.1%)
0.5% ( -3% - 4%)
Respell 67.48 (4.8%) 67.84 (5.5%)
0.5% ( -9% - 11%)
Fuzzy2 73.37 (15.2%) 78.35 (13.4%)
6.8% ( -18% - 41%)
{noformat}
One advantage of this approach is that it would help phraseFreq return earlier
if scores are not needed and there is a match at the beginning of the document.
> Simplify ExactPhraseScorer
> --------------------------
>
> Key: LUCENE-6260
> URL: https://issues.apache.org/jira/browse/LUCENE-6260
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Priority: Minor
> Attachments: LUCENE-6260.patch
>
>
> ExactPhraseScorer tries to intersect positions using windows of 4096
> documents. In LUCENE-2410 it was reported that it helped a lot but I tried
> again on wikibig with a simpler impl that does advance one position at a time
> and the performance difference was only of a few percents. I'm guessing that
> maybe other changes (eg. the new postings format?) do not make this behaviour
> as useful as it used to be?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]