Here is another set of random data points: Data generated with LoadTestTool, which inherently has some randomness. However, these should be good for ballpark figures.
hbase org.apache.hadoop.hbase.util.LoadTestTool -write 1:10:100 -num_keys 1000000 -read 100:30 -num_tables 1 -data_block_encoding NONE -tn load_test_tool_NONE 279723839 /apps/hbase/data/data/default/load_test_tool_NONE 103100244 /apps/hbase/data/data/default/load_test_tool_DIFF 103432465 /apps/hbase/data/data/default/load_test_tool_FAST_DIFF 134790042 /apps/hbase/data/data/default/load_test_tool_PREFIX 97963420 /apps/hbase/data/data/default/load_test_tool_PREFIX_TREE 78579277 /apps/hbase/data/data/default/load_test_tool_GZ 105321959 /apps/hbase/data/data/default/load_test_tool_SNAPPY 108040063 /apps/hbase/data/data/default/load_test_tool_LZO 110784379 /apps/hbase/data/data/default/load_test_tool_LZ4 78059199 /apps/hbase/data/data/default/load_test_tool_SNAPPY_FAST_DIFF 77214771 /apps/hbase/data/data/default/load_test_tool_LZO_FAST_DIFF Enis On Tue, Sep 24, 2013 at 1:11 PM, Ted Yu <[email protected]> wrote: > According to > > http://pokecraft.first-world.info/wiki/Quick_Benchmark:_Gzip_vs_Bzip2_vs_LZMA_vs_XZ_vs_LZ4_vs_LZO > , > LZ4 is faster than LZOP but consumes much more memory. > > Cheers > > > On Wed, Sep 18, 2013 at 8:34 PM, lars hofhansl <[email protected]> wrote: > > > Do you have any numbers on compression speed, too? > > I continue to be surprised by the relative compression ratios between > LZ4, > > LZO, and SNAPPY. > > I had expected SNAPPY and LZO to be roughly equivalent and LZ4 to be far > > better than LZO. > > > > -- Lars > > > > > > > > ________________________________ > > From: Nick Dimiduk <[email protected]> > > To: hbase-dev <[email protected]> > > Sent: Wednesday, September 18, 2013 5:19 PM > > Subject: Re: Documenting Guidance on compression and codecs > > > > > > For completeness, here's an entry for LZ4: > > > > +--------------------+--------------+ > > | compression:LZ4 | 391017061 | > > +--------------------+--------------+ > > > > > > > > On Wed, Sep 11, 2013 at 12:10 PM, Nick Dimiduk <[email protected]> > wrote: > > > > > Do we have a consolidated resource with information and recommendations > > > about use of the above? For instance, I ran a simple test using > > > PerformanceEvaluation, examining just the size of data on disk for 1G > of > > > input data. The matrix below has some surprising results: > > > > > > +--------------------+--------------+ > > > | MODIFIER | SIZE (bytes) | > > > +--------------------+--------------+ > > > | none | 1108553612 | > > > +--------------------+--------------+ > > > | compression:SNAPPY | 427335534 | > > > +--------------------+--------------+ > > > | compression:LZO | 270422088 | > > > +--------------------+--------------+ > > > | compression:GZ | 152899297 | > > > +--------------------+--------------+ > > > | codec:PREFIX | 1993910969 | > > > +--------------------+--------------+ > > > | codec:DIFF | 1960970083 | > > > +--------------------+--------------+ > > > | codec:FAST_DIFF | 1061374722 | > > > +--------------------+--------------+ > > > | codec:PREFIX_TREE | 1066586604 | > > > +--------------------+--------------+ > > > > > > Where does a wayward soul look for guidance on which combination of the > > > above to choose for their application? > > > > > > Thanks, > > > Nick > > > > > >
