Nate, This is great. I have been looking for more up to date work on this. I'm currently working on (Zhang 2008) and (Yan 2009). This paper and code looks to be a goldmine. You just made my day.
Nate your always welcome to join the Hangout this and other papers may be more interesting for you. I would be down for a few follow papers working up to (Lemire 2012). An even more challenging than understanding the papers is how to best incorporate it into Lucy as efficiently as we can. Thanks, -Dan On Sat, Jan 12, 2013 at 10:36 AM, Nathan Kurz <[email protected]> wrote: > Not sure if I'll be able to participate, but I'd highly recommend that > anyone reading the Zukowski paper (which is a fine introduction) > immediately continue on to Daniel Lemire's 2012 "Decoding billions of > integers per second through vectorization". > > http://arxiv.org/pdf/1209.2137v3.pdf > > Lemire has written a survey of the current PFOR family of algorithms, > showing a couple generations of improvements that have happened in the > 6 years since Zukowski, and proposes another round of improvements on > top of these. It has good summaries of the other algorithms, and > CLEAR WORKING CODE FOR EACH OF THEM! > > https://github.com/lemire/FastPFor > > The software includes benchmarking so that the different > algorithms can easily be tested on different data sets. It would also > be also to add algorithms to be compared. His is fully-vectorized > almost branch-free implementation with improved > exception handling provides approximately 4x faster encoding and 2x > faster decoding than the Zukowski's original. And this implementation is > already under the Apache license. > > --nate > > > > > On Fri, Jan 11, 2013 at 1:56 PM, Dan Markham <[email protected]> wrote: > > The questions are on the Wiki! > > > > > > The Lucy Book Club is taking a break from our book-in-progress this week > to read a paper on integer compression techniques. One of the algorithms > described in the paper is PFOR-DELTA (Patched Frame-Of-Reference with delta > encoding), which is particularly suitable for inverted lists. > > > > > > http://wiki.apache.org/lucy/LucyBookClub > > > > Enjoy, > > > > -Dan >
