[jira] [Commented] (HBASE-8109) HBase can manage blocks instead of files in HDFS

Andrew Purtell (JIRA) Thu, 14 Mar 2013 12:52:15 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-8109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13602632#comment-13602632
 ]


Andrew Purtell commented on HBASE-8109:
---------------------------------------

bq. Once we have isolated the fs calls, we can switch to a system table 
tracking blocks. (that is basically a sort of Name Node)

How would we bootstrap that block tracking system table out of a pool of 
blocks? Would there be an HBase side analogue to the NN fsimage? Would we keep 
that state in ZK, thus making ZK persistence critical? As a strawman I guess we 
could have one block containing pointers to all blocks holding the block 
tracking table -- in effect, a superblock -- and we could back this up to 
multiple locations like FSes do, and where today we have ZK pointing to META 
locations it would point to this bootstrap inode instead. HBCK could be taught 
how to scan for superblocks.

A very interesting notion of building "HBaseFS".

Another option for getting the espoused benefits for compaction optimization 
without going all the way to a pool of blocks: Consider an HDFS API for 
stitching together files? We could get a list of blocks from the NN for a path, 
rewrite them independently as part of compaction (or split), and then hand an 
updated list of blocks/lengths back to the NN for it to store back into the 
fsimage. This would still give us the flexibility to even do (insane?) things 
like share blocks between files across splits.
                
> HBase can manage blocks instead of files in HDFS
> ------------------------------------------------
>
>                 Key: HBASE-8109
>                 URL: https://issues.apache.org/jira/browse/HBASE-8109
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Sergey Shelukhin
>
> Prompted by previous non-Hadoop experience and some dev list discussions, and 
> after talking to some HDFS people about blocks.
> HBase could improve a lot by managing HDFS blocks instead of files, and 
> reusing the blocks among other things. Some areas that could improve are 
> splits, compactions, management of large blobs, locality enforcement.
> I was told that block APIs in Hadoop 2 are well-isolated, but not exposed 
> yet. They can easily be exposed, and as one of the first potential users we 
> could get to help shape them. Two areas that from my limited understanding is 
> currently fuzzy are namespaces for blocks, and ref-counting.
> We should come up with list of initial scenarios to figure out what we need 
> from block API (locality, detecting/enforcing block boundary/variable size 
> blocks, reusing one block, ...).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8109) HBase can manage blocks instead of files in HDFS

Reply via email to