This stuff going to make it into the refguide? Its good stuff. St.Ack
On Wed, Sep 11, 2013 at 1:30 PM, lars hofhansl <[email protected]> wrote: > PE has short and unique keys, so any prefix encoding won't buy much (or > make it worse). > > What's interesting to me is the difference between snappy and lzo, I > expected them to be mostly equivalent in terms of compression. > > So as a general guideline I'd say: > o If you have long keys (compared to the values) or many columns, use a > prefix encoder. Only use FAST_DIFF. > o If the values are large (and not precompressed as in images), use a > block compressor (SNAPPY, LZO, GZIP, etc) > o Use GZIP for cold data > o Use SNAPPY or LZO for hot data. > o In most cases you do want to enable SNAPPY or LZO by default (low perf > overhead + space savings). > > -- Lars > > > > ________________________________ > From: Nick Dimiduk <[email protected]> > To: hbase-dev <[email protected]> > Sent: Wednesday, September 11, 2013 12:10 PM > Subject: Documenting Guidance on compression and codecs > > > Do we have a consolidated resource with information and recommendations > about use of the above? For instance, I ran a simple test using > PerformanceEvaluation, examining just the size of data on disk for 1G of > input data. The matrix below has some surprising results: > > +--------------------+--------------+ > | MODIFIER | SIZE (bytes) | > +--------------------+--------------+ > | none | 1108553612 | > +--------------------+--------------+ > | compression:SNAPPY | 427335534 | > +--------------------+--------------+ > | compression:LZO | 270422088 | > +--------------------+--------------+ > | compression:GZ | 152899297 | > +--------------------+--------------+ > | codec:PREFIX | 1993910969 | > +--------------------+--------------+ > | codec:DIFF | 1960970083 | > +--------------------+--------------+ > | codec:FAST_DIFF | 1061374722 | > +--------------------+--------------+ > | codec:PREFIX_TREE | 1066586604 | > +--------------------+--------------+ > > Where does a wayward soul look for guidance on which combination of the > above to choose for their application? > > Thanks, > Nick >
