[jira] Updated: (LUCENE-2723) Speed up Lucene's low level bulk postings read API

Robert Muir (JIRA) Tue, 14 Dec 2010 14:38:25 -0800

     [ 
https://issues.apache.org/jira/browse/LUCENE-2723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Robert Muir updated LUCENE-2723:
--------------------------------

    Attachment: LUCENE-2723_termscorer.patch

Here's a patch to make TermScorer more readable: advance() is still scary
but the rest starts to look reasonable.

I pulled out the omitTF case into a MatchOnlyTermScorer.
Here's the benchmark with luceneutil.

{noformat}
               Query  QPS branch   QPS patch  Pct diff
spanNear([unit, state], 10, true)        2.91        2.87     -1.3%
                uni*       11.36       11.31     -0.4%
               unit*       20.89       20.81     -0.4%
        "unit state"        6.14        6.13     -0.2%
                 u*d       17.30       17.28     -0.1%
          unit state        7.47        7.46     -0.1%
                un*d       55.42       55.69      0.5%
  spanFirst(unit, 5)       12.27       12.34      0.6%
          united~2.0       13.51       13.61      0.7%
          united~1.0       49.88       50.30      0.8%
            unit~1.0       13.00       13.27      2.0%
               state       27.67       28.32      2.4%
            unit~2.0       12.46       12.79      2.6%
    +nebraska +state       75.91       79.97      5.3%
        +unit +state        8.63        9.25      7.1%
{noformat}

> Speed up Lucene's low level bulk postings read API
> --------------------------------------------------
>
>                 Key: LUCENE-2723
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2723
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 4.0
>
>         Attachments: LUCENE-2723.patch, LUCENE-2723.patch, LUCENE-2723.patch, 
> LUCENE-2723.patch, LUCENE-2723.patch, LUCENE-2723_termscorer.patch
>
>
> Spinoff from LUCENE-1410.
> The flex DocsEnum has a simple bulk-read API that reads the next chunk
> of docs/freqs.  But it's a poor fit for intblock codecs like FOR/PFOR
> (from LUCENE-1410).  This is not unlike sucking coffee through those
> tiny plastic coffee stirrers they hand out airplanes that,
> surprisingly, also happen to function as a straw.
> As a result we see no perf gain from using FOR/PFOR.
> I had hacked up a fix for this, described at in my blog post at
> http://chbits.blogspot.com/2010/08/lucene-performance-with-pfordelta-codec.html
> I'm opening this issue to get that work to a committable point.
> So... I've worked out a new bulk-read API to address performance
> bottleneck.  It has some big changes over the current bulk-read API:
>   * You can now also bulk-read positions (but not payloads), but, I
>      have yet to cutover positional queries.
>   * The buffer contains doc deltas, not absolute values, for docIDs
>     and positions (freqs are absolute).
>   * Deleted docs are not filtered out.
>   * The doc & freq buffers need not be "aligned".  For fixed intblock
>     codecs (FOR/PFOR) they will be, but for varint codecs (Simple9/16,
>     Group varint, etc.) they won't be.
> It's still a work in progress...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Updated: (LUCENE-2723) Speed up Lucene's low level bulk postings read API

Reply via email to