[ https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Han Jiang updated LUCENE-3892: ------------------------------ Attachment: LUCENE-3892_pfor.patch Here is a initial implementation of PForPostingsFormat. It is registered in oal.codecs.mockrandom.MockRandomPostingsFormat, and all tests have passed (Maybe I should modify some other mock files as well?). This version is orginally inspired by the pfor and pfor2 impls in bulk_branch, mostly by the idea of pfor. Currently, the compressed data consists of three parts: header, normal area, and excpetion area. The normal area encodes each small value as b bits, as well as exception values. The exception area stores each large value directly, possibly as 8,16,or 32 bits. NumFrameBits range from 1-32 are all supported. I haven't test the performance, but there are some known bottlenecks: For example, data = {0, 0xffffffff, 0, 1, 0, 1, 0}, numFrameBits=1, then the following '1's will be forced as exceptions, which will dramatically increase compressed size. > Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, > Simple9/16/64, etc.) > ------------------------------------------------------------------------------------- > > Key: LUCENE-3892 > URL: https://issues.apache.org/jira/browse/LUCENE-3892 > Project: Lucene - Java > Issue Type: Improvement > Reporter: Michael McCandless > Labels: gsoc2012, lucene-gsoc-12 > Fix For: 4.1 > > Attachments: LUCENE-3892_pfor.patch, LUCENE-3892_settings.patch, > LUCENE-3892_settings.patch > > > On the flex branch we explored a number of possible intblock > encodings, but for whatever reason never brought them to completion. > There are still a number of issues opened with patches in different > states. > Initial results (based on prototype) were excellent (see > http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html > ). > I think this would make a good GSoC project. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org