[jira] [Updated] (LUCENE-7839) Optimize the default NormsFormat for the case that all norms are in 0..16

Adrien Grand (JIRA) Wed, 24 May 2017 09:15:22 -0700

     [ 
https://issues.apache.org/jira/browse/LUCENE-7839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Adrien Grand updated LUCENE-7839:
---------------------------------
    Attachment: LUCENE-7839.patch

I tried to leverage the iterator API similarly to what numeric doc values do, 
but luceneutil seems to notice a performance hit:

{noformat}
                    TaskQPS baseline      StdDev   QPS patch      StdDev        
        Pct diff
                HighTerm      569.71     (11.5%)      490.35      (9.0%)  
-13.9% ( -30% -    7%)
              OrHighHigh      138.08     (11.6%)      123.27      (7.1%)  
-10.7% ( -26% -    9%)
               OrHighMed      295.37     (11.2%)      269.99      (8.1%)   
-8.6% ( -25% -   12%)
               OrHighLow      379.17      (9.1%)      351.63      (6.4%)   
-7.3% ( -20% -    9%)
                 MedTerm     1518.29     (11.9%)     1421.77      (6.8%)   
-6.4% ( -22% -   14%)
             AndHighHigh      386.22      (9.3%)      367.76      (9.0%)   
-4.8% ( -21% -   14%)
                 LowTerm     3236.73      (8.3%)     3118.34      (8.3%)   
-3.7% ( -18% -   14%)
         MedSloppyPhrase      555.94      (9.6%)      537.02      (6.3%)   
-3.4% ( -17% -   13%)
   HighTermDayOfYearSort      330.62     (12.2%)      320.20      (9.8%)   
-3.2% ( -22% -   21%)
               MedPhrase      635.77      (9.6%)      616.12      (8.1%)   
-3.1% ( -18% -   16%)
        HighSloppyPhrase      147.02      (8.6%)      142.77      (7.9%)   
-2.9% ( -17% -   14%)
                  IntNRQ      117.56      (9.8%)      114.43     (10.2%)   
-2.7% ( -20% -   19%)
            HighSpanNear       57.73      (7.9%)       56.21      (7.4%)   
-2.6% ( -16% -   13%)
         LowSloppyPhrase      385.52      (8.9%)      375.39      (6.5%)   
-2.6% ( -16% -   13%)
               LowPhrase      653.67      (9.7%)      637.17      (7.4%)   
-2.5% ( -17% -   16%)
                 Prefix3      287.63     (12.3%)      281.78     (10.3%)   
-2.0% ( -21% -   23%)
                 Respell      144.41      (7.8%)      141.67      (6.7%)   
-1.9% ( -15% -   13%)
              AndHighMed      676.46      (8.3%)      665.05      (9.8%)   
-1.7% ( -18% -   17%)
                Wildcard      214.90      (8.5%)      211.57      (7.0%)   
-1.5% ( -15% -   15%)
              HighPhrase       20.11      (9.7%)       20.03      (8.5%)   
-0.4% ( -17% -   19%)
             MedSpanNear      476.40      (8.7%)      476.48      (7.7%)    
0.0% ( -15% -   18%)
              AndHighLow      964.81      (9.8%)      965.18      (8.0%)    
0.0% ( -16% -   19%)
       HighTermMonthSort     1190.72      (9.6%)     1194.44     (11.4%)    
0.3% ( -18% -   23%)
             LowSpanNear      421.27      (7.8%)      423.97      (9.9%)    
0.6% ( -15% -   19%)
                  Fuzzy2       49.17     (16.2%)       50.09     (19.1%)    
1.9% ( -28% -   44%)
                  Fuzzy1      129.89     (12.6%)      132.32     (11.9%)    
1.9% ( -20% -   30%)
{noformat}

You can find the patch that I played with attached. It keeps the current levels 
of compression, but just splits values into blocks of 2^14 values and decides 
on the number of bits on a per-block basis. Maybe there is a better way to do 
this...

> Optimize the default NormsFormat for the case that all norms are in 0..16
> -------------------------------------------------------------------------
>
>                 Key: LUCENE-7839
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7839
>             Project: Lucene - Core
>          Issue Type: Task
>            Reporter: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-7839.patch
>
>
> Given how we now store the length of the field in norms, we could optimize 
> the default norms format for the case that all norms are in 0..16 and store 
> it on 4 bits. This would be picked up for short fields that have less than 16 
> terms (eg. title fields) and reduce disk utilization by 2.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (LUCENE-7839) Optimize the default NormsFormat for the case that all norms are in 0..16

Reply via email to