[ 
https://issues.apache.org/jira/browse/HBASE-68?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728857#action_12728857
 ] 

Jonathan Gray commented on HBASE-68:
------------------------------------

Though not fixed/solved, I think we should close this issue as invalid.

KVs must always contain their families because they are self-contained.  Moving 
forward if we ever do locality groups, we'll definitely need them.

By making them self-contained, we never have to rewrite/reallocate the data.  
ie. our zero-copy reads pass along KV references to the actual HFile block we 
read in.  Our Result is nothing but a List<KV> and we do not care whether they 
are all the same family or multiple families or whatever.  If our KV no longer 
stores the family we will have to undo the new optimizations (of not building a 
big Map as we build the Result) and start to track everything per family as we 
build the Result.

All other issues outlined above like gratuitous object creations are also gone. 
 This optimization would only undo them.

+1 here for closing issue

> [hbase] HStoreFiles needlessly store the column family name in every entry
> --------------------------------------------------------------------------
>
>                 Key: HBASE-68
>                 URL: https://issues.apache.org/jira/browse/HBASE-68
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Bryan Duxbury
>            Priority: Minor
>             Fix For: 0.20.0
>
>
> Today, HStoreFiles keep the entire serialized HStoreKey objects around for 
> every cell in the HStore. Since HStores are 1-1 with column families, this is 
> really unnecessary - you can always surmise the column family by looking at 
> the HStore it belongs to. (This information would ostensibly come from the 
> file name or a header section.) This means that we could remove the column 
> family part of the HStoreKeys we put into the HStoreFile, reducing the size 
> of data stored. This would be a space-saving benefit, removing redundant 
> data, and could be a speed benefit, as you have to scan over less data in 
> memory and transfer less data over the network.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to