64, etc.)

Michael McCandless (JIRA) Fri, 22 Jun 2012 11:38:44 -0700

     [ 
https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Michael McCandless updated LUCENE-3892:
---------------------------------------

    Attachment: LUCENE-3892-BlockTermScorer.patch

I was curious how much the "layers" (SepPostingsReader,
FixedIntBlock.IntIndexInput, ForFactor) between the FOR block decode
and the query scoring were hurting performance, so I wrote a
specialized scorer (BlockTermScorer) for just TermQuery.

The scorer is only used if the postings format is ForPF, and if no
skipping will be done (I didn't implement advance...).

The scorer reaches down and holds on to the decoded int[] buffer, and
then does its own adding up of the doc deltas, reading the next block,
etc.

The baseline is the current branch (not trunk!):

{noformat}
                Task    QPS base StdDev base   QPS patch StdDev patch     Pct 
diff
            Wildcard       10.31        0.40       10.10        0.17   -7% -    
3%
         AndHighHigh        4.90        0.10        4.82        0.15   -6% -    
3%
             Prefix3       28.50        1.06       28.11        0.50   -6% -    
4%
              IntNRQ        9.72        0.46        9.60        0.57  -11% -    
9%
        SloppyPhrase        0.92        0.03        0.92        0.02   -6% -    
5%
            PKLookup      106.21        2.54      105.66        2.07   -4% -    
3%
              Phrase        1.56        0.00        1.56        0.01   -1% -    
0%
              Fuzzy1       90.33        3.48       90.19        2.25   -6% -    
6%
              Fuzzy2       29.66        0.61       29.64        0.85   -4% -    
4%
          AndHighMed       14.87        0.29       15.02        0.81   -6% -    
8%
             Respell       78.83        2.46       79.62        1.54   -3% -    
6%
            SpanNear        1.18        0.02        1.19        0.04   -4% -    
6%
         TermGroup1M        2.78        0.06        3.28        0.14   10% -   
25%
          OrHighHigh        4.19        0.24        5.04        0.20    9% -   
32%
           OrHighMed        8.21        0.45        9.87        0.23   11% -   
30%
      TermBGroup1M1P        5.11        0.20        6.21        0.26   12% -   
31%
        TermBGroup1M        4.49        0.11        5.49        0.27   13% -   
31%
                Term        8.89        0.58       11.90        1.52    9% -   
61%
{noformat}

Seems like we get a good boost removing the abstractions.

                
> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, 
> Simple9/16/64, etc.)
> -------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3892
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3892
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>              Labels: gsoc2012, lucene-gsoc-12
>             Fix For: 4.1
>
>         Attachments: LUCENE-3892-BlockTermScorer.patch, 
> LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892_for.patch, 
> LUCENE-3892_for_byte[].patch, LUCENE-3892_for_int[].patch, 
> LUCENE-3892_for_unfold_method.patch, LUCENE-3892_pfor.patch, 
> LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, 
> LUCENE-3892_pfor_unfold_method.patch, LUCENE-3892_settings.patch, 
> LUCENE-3892_settings.patch
>
>
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
> ).
> I think this would make a good GSoC project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)

Reply via email to