[
https://issues.apache.org/jira/browse/HBASE-68?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12729499#action_12729499
]
ryan rawson commented on HBASE-68:
----------------------------------
I give this issue a big -1 (or -2 or whatever).
Right now we are 'needlessly' storing column family... but in 0.21 I hope to be
able to introduce locality groups, which will require us to have column family.
Another thing is how we don't have to expand/patch up the reply during the
regionserver processing of scan/get. This helps quite a bit. Even a code ->
string translation would cost us. And with a code-type of solution it would
make things more brittle as we can't change and reorder these codes without
invalidating an entire table.
With block compression, and LZO, we get amazing compression... between 2-4x
compressions I have seen with production data. This helps mollify the on-disk
storage cost of duplicating the column family.
> [hbase] HStoreFiles needlessly store the column family name in every entry
> --------------------------------------------------------------------------
>
> Key: HBASE-68
> URL: https://issues.apache.org/jira/browse/HBASE-68
> Project: Hadoop HBase
> Issue Type: Improvement
> Components: regionserver
> Reporter: Bryan Duxbury
> Priority: Minor
> Fix For: 0.21.0
>
>
> Today, HStoreFiles keep the entire serialized HStoreKey objects around for
> every cell in the HStore. Since HStores are 1-1 with column families, this is
> really unnecessary - you can always surmise the column family by looking at
> the HStore it belongs to. (This information would ostensibly come from the
> file name or a header section.) This means that we could remove the column
> family part of the HStoreKeys we put into the HStoreFile, reducing the size
> of data stored. This would be a space-saving benefit, removing redundant
> data, and could be a speed benefit, as you have to scan over less data in
> memory and transfer less data over the network.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.