[ 
https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3892:
---------------------------------------

    Attachment: LUCENE-3892-bulkVInt.patch

I tested BulkVInt again, ie to decouple the cutover from Sep to BlockPF vs the
vInt/FOR change.

Base=Lucene40, comp=BlockPF(BulkVInt):

{noformat}
                Task    QPS base StdDev baseQPS bulkVIntStdDev bulkVInt   Pct 
diff
          AndHighLow      857.35       20.10      614.20       10.73  -31% -  
-25%
             Respell       62.99        2.35       60.53        1.34   -9% -    
2%
          AndHighMed       65.64        2.24       63.61        0.93   -7% -    
1%
              Fuzzy2       62.83        1.75       61.72        1.31   -6% -    
3%
            PKLookup      195.97        1.87      194.73        5.00   -4% -    
2%
              IntNRQ       12.50        0.10       12.43        1.49  -13% -   
12%
              Fuzzy1       72.68        1.12       73.84        0.88   -1% -    
4%
          HighPhrase        1.75        0.05        1.78        0.08   -5% -    
8%
         LowSpanNear        9.01        0.12        9.27        0.13    0% -    
5%
           LowPhrase       19.73        0.43       20.64        0.15    1% -    
7%
         MedSpanNear        4.52        0.06        4.74        0.01    3% -    
6%
           MedPhrase       11.74        0.31       12.40        0.09    2% -    
9%
             LowTerm      435.96       13.41      467.22        9.10    1% -   
12%
             Prefix3       75.47        0.51       81.52        4.38    1% -   
14%
            Wildcard       48.66        0.44       52.79        2.79    1% -   
15%
          OrHighHigh       10.11        0.63       11.06        0.32    0% -   
20%
           OrHighMed       20.85        1.31       22.99        0.63    0% -   
20%
        HighSpanNear        1.50        0.02        1.67        0.01    8% -   
13%
           OrHighLow       23.55        1.46       26.51        0.76    2% -   
23%
     LowSloppyPhrase        6.45        0.14        7.37        0.18    9% -   
19%
             MedTerm      163.46       10.30      188.55        5.22    5% -   
26%
     MedSloppyPhrase        5.74        0.12        6.65        0.15   10% -   
20%
    HighSloppyPhrase        1.69        0.04        1.98        0.11    8% -   
26%
         AndHighHigh       19.00        0.53       22.91        0.24   16% -   
25%
            HighTerm       28.28        1.95       34.48        0.99   10% -   
34%
{noformat}


Base=BlockPF(BulkVInt), comp=BlockPF(FOR):

{noformat}
                Task    QPS base StdDev base     QPS for  StdDev for      Pct 
diff
              IntNRQ       12.10        1.70       11.61        0.02  -16% -   
11%
    HighSloppyPhrase        2.00        0.11        1.95        0.03   -8% -    
4%
          HighPhrase        1.85        0.05        1.81        0.07   -8% -    
4%
            Wildcard       52.32        3.09       52.49        0.24   -5% -    
7%
     LowSloppyPhrase        7.41        0.24        7.43        0.19   -5% -    
6%
     MedSloppyPhrase        6.69        0.18        6.72        0.21   -5% -    
6%
           OrHighMed       22.99        0.55       23.23        0.85   -4% -    
7%
             Respell       61.99        2.01       62.70        1.57   -4% -    
7%
           OrHighLow       26.52        0.69       26.83        1.00   -5% -    
7%
              Fuzzy1       74.72        1.34       75.59        1.43   -2% -    
4%
            PKLookup      189.68        7.14      192.09        3.82   -4% -    
7%
          OrHighHigh       11.05        0.27       11.21        0.42   -4% -    
7%
              Fuzzy2       62.78        1.86       63.70        1.87   -4% -    
7%
        HighSpanNear        1.65        0.03        1.69        0.02    0% -    
5%
             Prefix3       80.25        5.44       82.57        1.03   -4% -   
11%
         AndHighHigh       22.79        0.11       23.53        0.13    2% -    
4%
         LowSpanNear        9.16        0.26        9.48        0.21   -1% -    
8%
         MedSpanNear        4.67        0.09        4.84        0.07    0% -    
7%
           MedPhrase       12.59        0.26       13.07        0.24    0% -    
7%
           LowPhrase       20.86        0.33       22.06        0.30    2% -    
8%
          AndHighLow      618.27       13.15      655.52        3.30    3% -    
8%
            HighTerm       33.95        1.11       36.02        0.08    2% -    
9%
             MedTerm      186.09        5.51      198.46        0.09    3% -    
9%
          AndHighMed       63.71        1.15       69.15        0.45    5% -   
11%
             LowTerm      469.17        7.25      514.55        2.83    7% -   
12%
{noformat}

So ... most of the gains come from BlockPF cutover.  This is sort of
... surprising/disappointing, ie, our bottlenecks are the abstraction
layers, not the actual decode cost.  Still it's good to make progress
on removing the abstractions.

Also, it looks like the only query that is slower than Lucene40 is
AndHighLow ... however, it's also an extremely fast query to begin
with so I think it's a fine tradeoff that it gets slower while the
hard/slower queries get faster.

                
> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, 
> Simple9/16/64, etc.)
> -------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3892
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3892
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>              Labels: gsoc2012, lucene-gsoc-12
>             Fix For: 4.1
>
>         Attachments: LUCENE-3892-BlockTermScorer.patch, 
> LUCENE-3892-blockFor&hardcode(base).patch, 
> LUCENE-3892-blockFor&packedecoder(comp).patch, 
> LUCENE-3892-blockFor-with-packedints-decoder.patch, 
> LUCENE-3892-blockFor-with-packedints-decoder.patch, 
> LUCENE-3892-blockFor-with-packedints.patch, LUCENE-3892-bulkVInt.patch, 
> LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892-for&pfor-with-javadoc.patch, 
> LUCENE-3892-handle_open_files.patch, 
> LUCENE-3892-pfor-compress-iterate-numbits.patch, 
> LUCENE-3892-pfor-compress-slow-estimate.patch, LUCENE-3892_for_byte[].patch, 
> LUCENE-3892_for_int[].patch, LUCENE-3892_for_unfold_method.patch, 
> LUCENE-3892_pfor_unfold_method.patch, LUCENE-3892_pulsing_support.patch, 
> LUCENE-3892_settings.patch, LUCENE-3892_settings.patch
>
>
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
> ).
> I think this would make a good GSoC project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to