Do you have any numbers on compression speed, too? I continue to be surprised by the relative compression ratios between LZ4, LZO, and SNAPPY. I had expected SNAPPY and LZO to be roughly equivalent and LZ4 to be far better than LZO.
-- Lars ________________________________ From: Nick Dimiduk <[email protected]> To: hbase-dev <[email protected]> Sent: Wednesday, September 18, 2013 5:19 PM Subject: Re: Documenting Guidance on compression and codecs For completeness, here's an entry for LZ4: +--------------------+--------------+ | compression:LZ4 | 391017061 | +--------------------+--------------+ On Wed, Sep 11, 2013 at 12:10 PM, Nick Dimiduk <[email protected]> wrote: > Do we have a consolidated resource with information and recommendations > about use of the above? For instance, I ran a simple test using > PerformanceEvaluation, examining just the size of data on disk for 1G of > input data. The matrix below has some surprising results: > > +--------------------+--------------+ > | MODIFIER | SIZE (bytes) | > +--------------------+--------------+ > | none | 1108553612 | > +--------------------+--------------+ > | compression:SNAPPY | 427335534 | > +--------------------+--------------+ > | compression:LZO | 270422088 | > +--------------------+--------------+ > | compression:GZ | 152899297 | > +--------------------+--------------+ > | codec:PREFIX | 1993910969 | > +--------------------+--------------+ > | codec:DIFF | 1960970083 | > +--------------------+--------------+ > | codec:FAST_DIFF | 1061374722 | > +--------------------+--------------+ > | codec:PREFIX_TREE | 1066586604 | > +--------------------+--------------+ > > Where does a wayward soul look for guidance on which combination of the > above to choose for their application? > > Thanks, > Nick >
