[
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12708590#action_12708590
]
Paul Elschot commented on LUCENE-1410:
--------------------------------------
A very recent paper with some improvements to PFOR:
Yan, Ding, Suel,
Inverted Index Compression and Query Processing with Optimized Document
Ordering,
WWW 2009, April 20-24 2009, Madrid, Spain
Roughly quoting par. 4.2, Optimizing PForDelta compression:
For an exception, we store its lower b bits instead of the offset to the next
exception in its corresponding slot, while we store the higher overflow bits
and the offset in two separate arrays. These two arrays are compressed using
the Simple16 method.
Also b is chosen to optimize decompression speed. This makes the dependence of
b on the data quite simple, (in the PFOR above here this dependence is more
complex) and this improves compression speed.
Btw. the document ordering there is by URL. For many terms this gives more
shorter delta's between doc ids allowing a higher decompression speed of the
doc ids.
> PFOR implementation
> -------------------
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Other
> Reporter: Paul Elschot
> Priority: Minor
> Attachments: autogen.tgz, LUCENE-1410b.patch, LUCENE-1410c.patch,
> LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java,
> TestPFor2.java, TestPFor2.java
>
> Original Estimate: 21840h
> Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]