[ 
https://issues.apache.org/jira/browse/LUCENE-8312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16480331#comment-16480331
 ] 

Adrien Grand commented on LUCENE-8312:
--------------------------------------

Here is a patch which sums up term frequencies for each unique norm value in 
the impacts. I also did some refactorings to the way impacts are leveraged by 
TermScorer by introducing a new {{ImpactsDISI}} which abstracts how to leverage 
impacts to efficiently skip non-competitive documents. It is used by TermQuery, 
FeatureQuery and SynonymQuery, and maybe soon PhraseQuery as well.

I hacked luceneutil to run disjunctions as synonym queries to check the impact 
of this change when total hit counts are not tracked:
  
{noformat}
                    TaskQPS baseline      StdDev   QPS patch      StdDev        
        Pct diff
       HighTermMonthSort      158.74     (10.5%)      144.83     (10.2%)   
-8.8% ( -26% -   13%)
                HighTerm     1460.56      (5.3%)     1395.35      (3.5%)   
-4.5% ( -12% -    4%)
   HighTermDayOfYearSort       66.81      (9.3%)       64.08     (11.7%)   
-4.1% ( -22% -   18%)
             AndHighHigh       33.33      (5.0%)       32.15      (3.5%)   
-3.5% ( -11% -    5%)
                 MedTerm     1738.21      (4.9%)     1687.75      (3.2%)   
-2.9% ( -10% -    5%)
                 LowTerm     3582.99      (3.4%)     3496.28      (3.9%)   
-2.4% (  -9% -    5%)
              AndHighMed      154.32      (3.7%)      151.61      (2.7%)   
-1.8% (  -7% -    4%)
                 Prefix3       89.89      (5.0%)       89.15      (5.6%)   
-0.8% ( -10% -   10%)
                  IntNRQ       34.35     (13.9%)       34.21     (15.0%)   
-0.4% ( -25% -   33%)
               LowPhrase     1815.14      (3.1%)     1809.71      (3.0%)   
-0.3% (  -6% -    6%)
               MedPhrase      163.59      (1.4%)      163.20      (1.3%)   
-0.2% (  -2% -    2%)
        HighSloppyPhrase       12.22      (4.8%)       12.19      (4.8%)   
-0.2% (  -9% -    9%)
                 Respell      195.28      (2.4%)      194.94      (1.9%)   
-0.2% (  -4% -    4%)
                Wildcard      103.19      (2.7%)      103.02      (2.9%)   
-0.2% (  -5% -    5%)
                  Fuzzy2      159.47      (4.9%)      159.23      (7.6%)   
-0.2% ( -12% -   13%)
         MedSloppyPhrase       58.26      (4.2%)       58.22      (4.5%)   
-0.1% (  -8% -    8%)
         LowSloppyPhrase       61.14      (2.4%)       61.19      (2.6%)    
0.1% (  -4% -    5%)
             LowSpanNear       92.96      (3.7%)       93.13      (3.4%)    
0.2% (  -6% -    7%)
             MedSpanNear       48.08      (3.4%)       48.22      (3.3%)    
0.3% (  -6% -    7%)
                  Fuzzy1      312.46      (6.6%)      313.81     (11.1%)    
0.4% ( -16% -   19%)
            HighSpanNear        7.00      (5.5%)        7.03      (5.6%)    
0.4% ( -10% -   12%)
              HighPhrase       27.40      (2.6%)       27.53      (2.9%)    
0.5% (  -4% -    6%)
              AndHighLow     1219.32      (3.6%)     1233.33      (4.1%)    
1.1% (  -6% -    9%)
               OrHighMed       30.41      (7.7%)      141.92     (13.6%)  
366.6% ( 320% -  420%)
              OrHighHigh       23.02      (7.3%)      145.78     (16.6%)  
533.4% ( 474% -  601%)
               OrHighLow       35.95      (7.7%)      234.72     (19.9%)  
552.9% ( 488% -  628%)
{noformat}

> Leverage impacts for SynonymQuery
> ---------------------------------
>
>                 Key: LUCENE-8312
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8312
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-8312.patch
>
>
> Now that we expose raw impacts, we could leverage them for synonym queries.
> It would be a matter of summing up term frequencies for each unique norm 
> value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to