[ 
https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288564#comment-14288564
 ] 

Jing Zhao commented on HDFS-7339:
---------------------------------

Thanks for working on this, Zhe!

# First a quick comment about the current SequentialBlockGroupIdGenerator and 
SequentialBlockIdGenerator. The current patch tries to use a flag to 
distinguish contiguous and stripped blocks. However, since there may still be 
conflicts coming from historical randomly assigned block ID, for blocks in 
block reports, we still to check two places to determine if this is a 
contiguous block or a stripped block.
# My main concern is on BlockGroup, contiguous blocks, and stripped blocks. I 
think DataNode does not need to know the difference between contiguous blocks 
and stripped blocks (when doing recovery the datanode can learn the information 
from NameNode). The concept of BlockGroup should be known and used only 
internally in NameNode (and maybe also logically known by the client while 
writing). What we can do is:
#* Datanodes and their block reports do not distinguish stripped and contiguous 
blocks. And we do not need to distinguish them from the block ID. They are 
treated equally while storing and reporting in/from the DN.
#* BlockGroup is only a new concept sitting between INodeFile and Block 
(stripped) inside of the NameNode. Fundamentally BlockGroup is also a 
BlockCollection. We do not need to assign generation stamp to BlockGroup (and 
even its id can be omitted). What we need is only maintaining the mapping 
between block and blockgroup in the original blocksmap, recording the list of 
blocks in the blockgroup, and recording the blockgroups in INodeFile. This can 
be achieved by maybe slightly extending the BlockInfo and playing with the 
BlockCollection interface.

I think in this way we can simplify the current design and reuse most of the 
current block management code. To expedite the development and review, maybe we 
can use this jira to just focus on the definition of the blockgroup, the 
mapping between blocks and blockgroups, and the association between blockgroups 
and files (i.e., blockgroup related changes inside of NN). We can develop the 
{{addBlockgroup}} API and revisit how to handle under construction block and 
blockgroups (and whether we need to assign complete/commit state to block group 
and define BlockGroupUC) later in separate jiras.

> Allocating and persisting block groups in NameNode
> --------------------------------------------------
>
>                 Key: HDFS-7339
>                 URL: https://issues.apache.org/jira/browse/HDFS-7339
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Zhe Zhang
>            Assignee: Zhe Zhang
>         Attachments: HDFS-7339-001.patch, HDFS-7339-002.patch, 
> HDFS-7339-003.patch, HDFS-7339-004.patch, HDFS-7339-005.patch, 
> HDFS-7339-006.patch, Meta-striping.jpg, NN-stripping.jpg
>
>
> All erasure codec operations center around the concept of _block group_; they 
> are formed in initial encoding and looked up in recoveries and conversions. A 
> lightweight class {{BlockGroup}} is created to record the original and parity 
> blocks in a coding group, as well as a pointer to the codec schema (pluggable 
> codec schemas will be supported in HDFS-7337). With the striping layout, the 
> HDFS client needs to operate on all blocks in a {{BlockGroup}} concurrently. 
> Therefore we propose to extend a file’s inode to switch between _contiguous_ 
> and _striping_ modes, with the current mode recorded in a binary flag. An 
> array of BlockGroups (or BlockGroup IDs) is added, which remains empty for 
> “traditional” HDFS files with contiguous block layout.
> The NameNode creates and maintains {{BlockGroup}} instances through the new 
> {{ECManager}} component; the attached figure has an illustration of the 
> architecture. As a simple example, when a {_Striping+EC_} file is created and 
> written to, it will serve requests from the client to allocate new 
> {{BlockGroups}} and store them under the {{INodeFile}}. In the current phase, 
> {{BlockGroups}} are allocated both in initial online encoding and in the 
> conversion from replication to EC. {{ECManager}} also facilitates the lookup 
> of {{BlockGroup}} information for block recovery work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to