[ https://issues.apache.org/jira/browse/HADOOP-2521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12556302#action_12556302 ]
stack commented on HADOOP-2521: ------------------------------- Jim: I've been profiling while you've been on holidays. Looks like most of the low-hanging fruit has been picked: e.g. RPC serializations and gratuitous object creations in hbase. Apart from updates to memcache -- SortedMaps are 'expensive' -- the bulk of our time/resources are now in appending and nexting over MapFiles/SequenceFiles whether updating, reading, compacting or flushing (The latter two take up the bulk of CPU during writes at least). Anything we can do to improve our i/o story here will make for a win. As to dropping family name when we go to the fs, I like the idea, especially as its making keys (slightly) smaller... but yeah, lets measure first to see if these seemingly small savings even show up on the size/speed register. > [hbase] HStoreFiles needlessly store the column family name in every entry > -------------------------------------------------------------------------- > > Key: HADOOP-2521 > URL: https://issues.apache.org/jira/browse/HADOOP-2521 > Project: Hadoop > Issue Type: Improvement > Components: contrib/hbase > Reporter: Bryan Duxbury > Priority: Minor > > Today, HStoreFiles keep the entire serialized HStoreKey objects around for > every cell in the HStore. Since HStores are 1-1 with column families, this is > really unnecessary - you can always surmise the column family by looking at > the HStore it belongs to. (This information would ostensibly come from the > file name or a header section.) This means that we could remove the column > family part of the HStoreKeys we put into the HStoreFile, reducing the size > of data stored. This would be a space-saving benefit, removing redundant > data, and could be a speed benefit, as you have to scan over less data in > memory and transfer less data over the network. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.