[ 
https://issues.apache.org/jira/browse/HADOOP-2521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12556302#action_12556302
 ] 

stack commented on HADOOP-2521:
-------------------------------

Jim: I've been profiling while you've been on holidays.  Looks like most of the 
low-hanging fruit has been picked: e.g. RPC serializations and gratuitous 
object creations in hbase.  Apart from updates to memcache -- SortedMaps are 
'expensive' -- the bulk of our time/resources are now in appending and nexting 
over MapFiles/SequenceFiles whether updating, reading, compacting or flushing 
(The latter two take up the bulk of CPU during writes at least).   Anything we 
can do to improve our i/o story here will make for a win.

As to dropping family name when we go to the fs, I like the idea, especially as 
its making keys (slightly) smaller... but yeah, lets measure first to see if 
these seemingly small savings even show up on the size/speed register.

> [hbase] HStoreFiles needlessly store the column family name in every entry
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-2521
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2521
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>            Reporter: Bryan Duxbury
>            Priority: Minor
>
> Today, HStoreFiles keep the entire serialized HStoreKey objects around for 
> every cell in the HStore. Since HStores are 1-1 with column families, this is 
> really unnecessary - you can always surmise the column family by looking at 
> the HStore it belongs to. (This information would ostensibly come from the 
> file name or a header section.) This means that we could remove the column 
> family part of the HStoreKeys we put into the HStoreFile, reducing the size 
> of data stored. This would be a space-saving benefit, removing redundant 
> data, and could be a speed benefit, as you have to scan over less data in 
> memory and transfer less data over the network.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to