[jira] Updated: (LUCENE-2723) Speed up Lucene's low level bulk postings read API

Simon Willnauer (JIRA) Tue, 25 Jan 2011 09:21:07 -0800

     [ 
https://issues.apache.org/jira/browse/LUCENE-2723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Simon Willnauer updated LUCENE-2723:
------------------------------------

    Attachment: LUCENE-2723-BulkEnumWrapper.patch

This patch adds a BulkPostingsEnumWrapper that implement DocsEnumAndPositions 
by using the bulkpostings. I first just added this as a class to ease testsing 
for PositionDeltaBulks but it seems that this could be useful for more than 
just testing. Codecs that don't want to implement the DocsEnumAndPositions API 
can just use this wrapper to provide the functionality.
I also added a testcase for MemoryIndex that uses this wrapper

> Speed up Lucene's low level bulk postings read API
> --------------------------------------------------
>
>                 Key: LUCENE-2723
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2723
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 4.0
>
>         Attachments: LUCENE-2723-BulkEnumWrapper.patch, 
> LUCENE-2723-termscorer.patch, LUCENE-2723-termscorer.patch, 
> LUCENE-2723-termscorer.patch, LUCENE-2723.patch, LUCENE-2723.patch, 
> LUCENE-2723.patch, LUCENE-2723.patch, LUCENE-2723.patch, LUCENE-2723.patch, 
> LUCENE-2723_bulkvint.patch, LUCENE-2723_facetPerSeg.patch, 
> LUCENE-2723_facetPerSeg.patch, LUCENE-2723_openEnum.patch, 
> LUCENE-2723_termscorer.patch, LUCENE-2723_wastedint.patch
>
>
> Spinoff from LUCENE-1410.
> The flex DocsEnum has a simple bulk-read API that reads the next chunk
> of docs/freqs.  But it's a poor fit for intblock codecs like FOR/PFOR
> (from LUCENE-1410).  This is not unlike sucking coffee through those
> tiny plastic coffee stirrers they hand out airplanes that,
> surprisingly, also happen to function as a straw.
> As a result we see no perf gain from using FOR/PFOR.
> I had hacked up a fix for this, described at in my blog post at
> http://chbits.blogspot.com/2010/08/lucene-performance-with-pfordelta-codec.html
> I'm opening this issue to get that work to a committable point.
> So... I've worked out a new bulk-read API to address performance
> bottleneck.  It has some big changes over the current bulk-read API:
>   * You can now also bulk-read positions (but not payloads), but, I
>      have yet to cutover positional queries.
>   * The buffer contains doc deltas, not absolute values, for docIDs
>     and positions (freqs are absolute).
>   * Deleted docs are not filtered out.
>   * The doc & freq buffers need not be "aligned".  For fixed intblock
>     codecs (FOR/PFOR) they will be, but for varint codecs (Simple9/16,
>     Group varint, etc.) they won't be.
> It's still a work in progress...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Updated: (LUCENE-2723) Speed up Lucene's low level bulk postings read API

Reply via email to