[jira] [Commented] (LUCENE-8142) Should codecs expose raw impacts?

Adrien Grand (JIRA) Mon, 16 Apr 2018 11:10:28 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16439805#comment-16439805
 ]


Adrien Grand commented on LUCENE-8142:
--------------------------------------

I gave this a try. {{ImpactsEnum}} has a new method {{getImpacts}} that returns 
impacts on multiple levels. It makes it naturally implemented by a skip list. 
It might make it more challenging to back this information by another 
data-structure, but it also has API benefits, like removing references  to 
{{SimScorer}}  from {{TermsEnum.impacts}}.

wikibigall gives an improvement to term queries since this change allows term 
queries to skip at any level while they could only do it on the first level 
before. However the fact that the API is a bit more heavy seems to incur a 
slight slow down to conjunctions/disjunctions. I don't think it is an issue, 
especially because this change improves testing by allowing to better compare 
impacts against indexed data. Also this API means that we can now speed up 
queries that merge frequencies and norms rather than scores like 
{{SynonymQuery}} and {{BlendedTermQuery}}, which was not possible before.

{noformat}
             AndHighHigh       83.36      (3.8%)       79.45      (3.1%)   
-4.7% ( -11% -    2%)
              OrHighHigh       34.42      (2.7%)       32.93      (2.0%)   
-4.3% (  -8% -    0%)
              AndHighMed      115.73      (3.3%)      111.67      (3.0%)   
-3.5% (  -9% -    2%)
               OrHighMed       24.44      (3.3%)       23.74      (2.1%)   
-2.9% (  -8% -    2%)
               OrHighLow     1952.31      (4.7%)     1912.93      (3.6%)   
-2.0% (  -9% -    6%)
              AndHighLow     1837.61      (4.1%)     1802.22      (3.9%)   
-1.9% (  -9% -    6%)
                  Fuzzy1      229.31      (9.8%)      226.03      (8.9%)   
-1.4% ( -18% -   19%)
                  IntNRQ       31.75     (14.0%)       31.36     (12.5%)   
-1.2% ( -24% -   29%)
                  Fuzzy2      194.10      (9.6%)      192.36     (11.6%)   
-0.9% ( -20% -   22%)
         MedSloppyPhrase       54.96      (4.7%)       54.62      (4.2%)   
-0.6% (  -9% -    8%)
        HighSloppyPhrase        6.21      (5.9%)        6.18      (5.7%)   
-0.5% ( -11% -   11%)
         LowSloppyPhrase       19.26      (4.4%)       19.19      (4.3%)   
-0.4% (  -8% -    8%)
       HighTermMonthSort      180.22      (9.8%)      179.53     (10.4%)   
-0.4% ( -18% -   21%)
                Wildcard       60.86      (6.0%)       60.63      (6.3%)   
-0.4% ( -11% -   12%)
                 Prefix3       88.19      (8.3%)       87.89      (8.5%)   
-0.3% ( -15% -   17%)
                 Respell      195.14      (2.1%)      194.57      (2.5%)   
-0.3% (  -4% -    4%)
              HighPhrase       54.69      (1.6%)       54.72      (1.6%)    
0.1% (  -3% -    3%)
               MedPhrase       41.52      (1.8%)       41.56      (1.9%)    
0.1% (  -3% -    3%)
               LowPhrase       55.59      (1.8%)       55.68      (1.9%)    
0.2% (  -3% -    3%)
             MedSpanNear       28.55      (3.8%)       28.74      (3.8%)    
0.7% (  -6% -    8%)
            HighSpanNear       16.88      (4.6%)       17.03      (4.6%)    
0.9% (  -7% -   10%)
             LowSpanNear       14.50      (6.3%)       14.67      (6.2%)    
1.1% ( -10% -   14%)
   HighTermDayOfYearSort       61.22     (12.3%)       62.04     (12.4%)    
1.3% ( -20% -   29%)
                 LowTerm     2478.52      (4.1%)     2692.79      (4.0%)    
8.6% (   0% -   17%)
                 MedTerm      835.85      (5.8%)     1323.83      (6.8%)   
58.4% (  43% -   75%)
                HighTerm      472.60      (6.8%)     1718.45     (15.6%)  
263.6% ( 225% -  306%)
{noformat}

> Should codecs expose raw impacts?
> ---------------------------------
>
>                 Key: LUCENE-8142
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8142
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-8142.patch
>
>
> Follow-up of LUCENE-4198. Currently, call-sites of TermsEnum.impacts provide 
> a SimScorer so that the maximum score for the block can be computed. Should 
> ImpactsEnum instead return the (freq,norm) pairs and let callers deal with 
> max score computation?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8142) Should codecs expose raw impacts?

Reply via email to