[
https://issues.apache.org/jira/browse/HBASE-3927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040404#comment-13040404
]
Matt Corgan commented on HBASE-3927:
------------------------------------
Ted - I think the problem I'm most often seeing on the user list is that people
want the default 64K block size, but after they enable compression they don't
raise the block size to compensate for the compression. In many cases it's
easy to obtain compression of 10x or better, so the blocks on disk are ~6K,
which is smaller than anyone wants.
It's also true that data with large keys and small values (like an inverted
index) tends to compress well. Those big keys also necessitate relatively
large block cache entries. Because the block index has an entry for every
block, it can get overly large when a user has large keys and small compressed
blocks.
Exposing this metric just a way to remind unsuspecting users that block size is
calculated based on uncompressed size, rather than compressed disk size which
drives region splits. It should also make it easier to figure out how
effective different compression algorithms are, how big your compressed block
size is, what percent of your data you can fit in block cache, etc..
> display total uncompressed byte size of a region in web UI
> ----------------------------------------------------------
>
> Key: HBASE-3927
> URL: https://issues.apache.org/jira/browse/HBASE-3927
> Project: HBase
> Issue Type: Improvement
> Components: metrics
> Reporter: Matt Corgan
> Priority: Minor
>
> The decision to split data blocks when flushing and compacting is made based
> on the uncompressed data size which can often lead to compressed disk blocks
> that are a fraction of the intended 64 KB (default). This often leads to a
> larger number of blocks and index entries than expected and can cause block
> indexes to take up GB of memory.
> There is already a "long totalUncompressedBytes" written to the HFileTrailer.
> It would be nice to expose this in the web UI to make it easier to calculate
> the compression ratio and then raise the block size appropriately (not
> necessarily to get it back to 64K).
> This should probably be added wherever the other HFile metrics are:
> RegionLoad.createRegions(..), and HServerLoad. HServerLoad is a Writable, so
> it may break serialization.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira