Not sure if I'll be able to participate, but I'd highly recommend that
anyone reading the Zukowski paper (which is a fine introduction)
immediately continue on to Daniel Lemire's 2012 "Decoding billions of
integers per second through vectorization".

http://arxiv.org/pdf/1209.2137v3.pdf

Lemire has written a survey of the current PFOR family of algorithms,
showing a couple generations of improvements that have happened in the
6 years since Zukowski, and proposes another round of improvements on
top of these.  It has good summaries of the other algorithms, and
CLEAR WORKING CODE FOR EACH OF THEM!

https://github.com/lemire/FastPFor

The software includes benchmarking so that the different
algorithms can easily be tested on different data sets.  It would also
be also to add algorithms to be compared.  His is fully-vectorized
almost branch-free implementation with improved
exception handling provides approximately 4x faster encoding and 2x
faster decoding than the Zukowski's original.   And this implementation is
already under the Apache license.

--nate




On Fri, Jan 11, 2013 at 1:56 PM, Dan Markham <[email protected]> wrote:
> The questions are on the Wiki!
>
>
> The Lucy Book Club is taking a break from our book-in-progress this week to 
> read a paper on integer compression techniques. One of the algorithms 
> described in the paper is PFOR-DELTA (Patched Frame-Of-Reference with delta 
> encoding), which is particularly suitable for inverted lists.
>
>
> http://wiki.apache.org/lucy/LucyBookClub
>
> Enjoy,
>
> -Dan

Reply via email to