Not sure if I'll be able to participate, but I'd highly recommend that anyone reading the Zukowski paper (which is a fine introduction) immediately continue on to Daniel Lemire's 2012 "Decoding billions of integers per second through vectorization".
http://arxiv.org/pdf/1209.2137v3.pdf Lemire has written a survey of the current PFOR family of algorithms, showing a couple generations of improvements that have happened in the 6 years since Zukowski, and proposes another round of improvements on top of these. It has good summaries of the other algorithms, and CLEAR WORKING CODE FOR EACH OF THEM! https://github.com/lemire/FastPFor The software includes benchmarking so that the different algorithms can easily be tested on different data sets. It would also be also to add algorithms to be compared. His is fully-vectorized almost branch-free implementation with improved exception handling provides approximately 4x faster encoding and 2x faster decoding than the Zukowski's original. And this implementation is already under the Apache license. --nate On Fri, Jan 11, 2013 at 1:56 PM, Dan Markham <[email protected]> wrote: > The questions are on the Wiki! > > > The Lucy Book Club is taking a break from our book-in-progress this week to > read a paper on integer compression techniques. One of the algorithms > described in the paper is PFOR-DELTA (Patched Frame-Of-Reference with delta > encoding), which is particularly suitable for inverted lists. > > > http://wiki.apache.org/lucy/LucyBookClub > > Enjoy, > > -Dan
