[ 
https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14289983#comment-14289983
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7339:
-------------------------------------------

> The main reason for creating a BlockGroup class and the hierarchical block ID 
> protocol is to minimize NN memory overhead. ...

This can be achieved by using consecutive (normal) block IDs for the blocks in 
a block group without dividing the ID space; see below.  (This is not easy to 
describe it.  Please let me know if you are confused.)
- For the block groups stored in namenode, only store the first block ID.  The 
other block IDs can be deduced with the storage policy.
- Use the same generation stamp for all the blocks.
- How to support lookups in BlocksMap?  There are several ways described below.
-# Change the hash function so that consecutive IDs will be mapped to the same 
hash value and implement BlockGroup.equal(..) so that it returns true with any 
block id in the group.  For example, we may only use the high 60-bit for 
computing has code.  Suppose the blocks in a block group have ID from 0x302 to 
0x30A.  We will be able to lookup the block group using any of the block IDs.  
What happen if the first ID is near the low 4-bit boundary, say 0x30D?  We may 
simply skip to 0x310 when allocating the block IDs so that it won't happen.
-# We may store the first ID (or the offset to the first ID) also in datanode 
for ec blocks.  This seems not a good solution.

If we enforce block id allocation so that the lower 4-bit of the first ID must 
be zeros, then it is very similar to the scheme propused in the design doc 
except there is no notation of block group in the block IDs.


> Allocating and persisting block groups in NameNode
> --------------------------------------------------
>
>                 Key: HDFS-7339
>                 URL: https://issues.apache.org/jira/browse/HDFS-7339
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Zhe Zhang
>            Assignee: Zhe Zhang
>         Attachments: HDFS-7339-001.patch, HDFS-7339-002.patch, 
> HDFS-7339-003.patch, HDFS-7339-004.patch, HDFS-7339-005.patch, 
> HDFS-7339-006.patch, Meta-striping.jpg, NN-stripping.jpg
>
>
> All erasure codec operations center around the concept of _block group_; they 
> are formed in initial encoding and looked up in recoveries and conversions. A 
> lightweight class {{BlockGroup}} is created to record the original and parity 
> blocks in a coding group, as well as a pointer to the codec schema (pluggable 
> codec schemas will be supported in HDFS-7337). With the striping layout, the 
> HDFS client needs to operate on all blocks in a {{BlockGroup}} concurrently. 
> Therefore we propose to extend a file’s inode to switch between _contiguous_ 
> and _striping_ modes, with the current mode recorded in a binary flag. An 
> array of BlockGroups (or BlockGroup IDs) is added, which remains empty for 
> “traditional” HDFS files with contiguous block layout.
> The NameNode creates and maintains {{BlockGroup}} instances through the new 
> {{ECManager}} component; the attached figure has an illustration of the 
> architecture. As a simple example, when a {_Striping+EC_} file is created and 
> written to, it will serve requests from the client to allocate new 
> {{BlockGroups}} and store them under the {{INodeFile}}. In the current phase, 
> {{BlockGroups}} are allocated both in initial online encoding and in the 
> conversion from replication to EC. {{ECManager}} also facilitates the lookup 
> of {{BlockGroup}} information for block recovery work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to