[ https://issues.apache.org/jira/browse/HADOOP-2521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12556258#action_12556258 ]
Jim Kellerman commented on HADOOP-2521: --------------------------------------- Point taken. However since we have done little to no performance analysis to date, I would say that this would be a premature optimization. Let's see where the hot spots are first and address them. > [hbase] HStoreFiles needlessly store the column family name in every entry > -------------------------------------------------------------------------- > > Key: HADOOP-2521 > URL: https://issues.apache.org/jira/browse/HADOOP-2521 > Project: Hadoop > Issue Type: Improvement > Components: contrib/hbase > Reporter: Bryan Duxbury > Priority: Minor > > Today, HStoreFiles keep the entire serialized HStoreKey objects around for > every cell in the HStore. Since HStores are 1-1 with column families, this is > really unnecessary - you can always surmise the column family by looking at > the HStore it belongs to. (This information would ostensibly come from the > file name or a header section.) This means that we could remove the column > family part of the HStoreKeys we put into the HStoreFile, reducing the size > of data stored. This would be a space-saving benefit, removing redundant > data, and could be a speed benefit, as you have to scan over less data in > memory and transfer less data over the network. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.