[
https://issues.apache.org/jira/browse/HBASE-11400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14057299#comment-14057299
]
Jonathan Hsieh commented on HBASE-11400:
----------------------------------------
This is a good improvement. I think more can be done -- here are suggestions:
- explain that there are tradeoffs for compression and encoding in the first
section (where you are talking about how they go on a column family). Maybe
say something about how compression codecs take big opaque byte arrays, while
encodings take advantage of some of the structure that hbase knows its data
formats.
- Data Block encoding types section: Consider asking [~mbertozzi] if you can
use the images from here.
http://blog.cloudera.com/blog/2012/06/hbase-io-hfile-input-output/
- which compression or codec to use
-- Would be good to explain why should gzip be used for cold data and snappy
and lzo for hot data. Because lzo and snappy favor low cpu usage and a poorer
compression ratio while gzip favors more cpu usage and a higher compression
ratio.
-- The codec part is pretty weak here. Maybe use examples from matteo's blog
post or drop it since there is only one line. Also consider mentioning why
you'd want to use a encoder in this section and just describe what the
different types mean in the previous section.
It is probably worth noting here that when settings are changed on an existing
colunm family, the encodings and compression is applied on compaction.
> Edit, consolidate, and update Compression and data encoding docs
> ----------------------------------------------------------------
>
> Key: HBASE-11400
> URL: https://issues.apache.org/jira/browse/HBASE-11400
> Project: HBase
> Issue Type: Improvement
> Components: documentation
> Reporter: Misty Stanley-Jones
> Assignee: Misty Stanley-Jones
> Priority: Minor
> Attachments: HBASE-11400-1.patch, HBASE-11400.patch
>
>
> Current docs are here: http://hbase.apache.org/book.html#compression.test
> It could use some editing and expansion.
--
This message was sent by Atlassian JIRA
(v6.2#6252)