[
https://issues.apache.org/jira/browse/LUCENE-8312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16480331#comment-16480331
]
Adrien Grand commented on LUCENE-8312:
--------------------------------------
Here is a patch which sums up term frequencies for each unique norm value in
the impacts. I also did some refactorings to the way impacts are leveraged by
TermScorer by introducing a new {{ImpactsDISI}} which abstracts how to leverage
impacts to efficiently skip non-competitive documents. It is used by TermQuery,
FeatureQuery and SynonymQuery, and maybe soon PhraseQuery as well.
I hacked luceneutil to run disjunctions as synonym queries to check the impact
of this change when total hit counts are not tracked:
{noformat}
TaskQPS baseline StdDev QPS patch StdDev
Pct diff
HighTermMonthSort 158.74 (10.5%) 144.83 (10.2%)
-8.8% ( -26% - 13%)
HighTerm 1460.56 (5.3%) 1395.35 (3.5%)
-4.5% ( -12% - 4%)
HighTermDayOfYearSort 66.81 (9.3%) 64.08 (11.7%)
-4.1% ( -22% - 18%)
AndHighHigh 33.33 (5.0%) 32.15 (3.5%)
-3.5% ( -11% - 5%)
MedTerm 1738.21 (4.9%) 1687.75 (3.2%)
-2.9% ( -10% - 5%)
LowTerm 3582.99 (3.4%) 3496.28 (3.9%)
-2.4% ( -9% - 5%)
AndHighMed 154.32 (3.7%) 151.61 (2.7%)
-1.8% ( -7% - 4%)
Prefix3 89.89 (5.0%) 89.15 (5.6%)
-0.8% ( -10% - 10%)
IntNRQ 34.35 (13.9%) 34.21 (15.0%)
-0.4% ( -25% - 33%)
LowPhrase 1815.14 (3.1%) 1809.71 (3.0%)
-0.3% ( -6% - 6%)
MedPhrase 163.59 (1.4%) 163.20 (1.3%)
-0.2% ( -2% - 2%)
HighSloppyPhrase 12.22 (4.8%) 12.19 (4.8%)
-0.2% ( -9% - 9%)
Respell 195.28 (2.4%) 194.94 (1.9%)
-0.2% ( -4% - 4%)
Wildcard 103.19 (2.7%) 103.02 (2.9%)
-0.2% ( -5% - 5%)
Fuzzy2 159.47 (4.9%) 159.23 (7.6%)
-0.2% ( -12% - 13%)
MedSloppyPhrase 58.26 (4.2%) 58.22 (4.5%)
-0.1% ( -8% - 8%)
LowSloppyPhrase 61.14 (2.4%) 61.19 (2.6%)
0.1% ( -4% - 5%)
LowSpanNear 92.96 (3.7%) 93.13 (3.4%)
0.2% ( -6% - 7%)
MedSpanNear 48.08 (3.4%) 48.22 (3.3%)
0.3% ( -6% - 7%)
Fuzzy1 312.46 (6.6%) 313.81 (11.1%)
0.4% ( -16% - 19%)
HighSpanNear 7.00 (5.5%) 7.03 (5.6%)
0.4% ( -10% - 12%)
HighPhrase 27.40 (2.6%) 27.53 (2.9%)
0.5% ( -4% - 6%)
AndHighLow 1219.32 (3.6%) 1233.33 (4.1%)
1.1% ( -6% - 9%)
OrHighMed 30.41 (7.7%) 141.92 (13.6%)
366.6% ( 320% - 420%)
OrHighHigh 23.02 (7.3%) 145.78 (16.6%)
533.4% ( 474% - 601%)
OrHighLow 35.95 (7.7%) 234.72 (19.9%)
552.9% ( 488% - 628%)
{noformat}
> Leverage impacts for SynonymQuery
> ---------------------------------
>
> Key: LUCENE-8312
> URL: https://issues.apache.org/jira/browse/LUCENE-8312
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Priority: Minor
> Attachments: LUCENE-8312.patch
>
>
> Now that we expose raw impacts, we could leverage them for synonym queries.
> It would be a matter of summing up term frequencies for each unique norm
> value.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]