[
https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288564#comment-14288564
]
Jing Zhao commented on HDFS-7339:
---------------------------------
Thanks for working on this, Zhe!
# First a quick comment about the current SequentialBlockGroupIdGenerator and
SequentialBlockIdGenerator. The current patch tries to use a flag to
distinguish contiguous and stripped blocks. However, since there may still be
conflicts coming from historical randomly assigned block ID, for blocks in
block reports, we still to check two places to determine if this is a
contiguous block or a stripped block.
# My main concern is on BlockGroup, contiguous blocks, and stripped blocks. I
think DataNode does not need to know the difference between contiguous blocks
and stripped blocks (when doing recovery the datanode can learn the information
from NameNode). The concept of BlockGroup should be known and used only
internally in NameNode (and maybe also logically known by the client while
writing). What we can do is:
#* Datanodes and their block reports do not distinguish stripped and contiguous
blocks. And we do not need to distinguish them from the block ID. They are
treated equally while storing and reporting in/from the DN.
#* BlockGroup is only a new concept sitting between INodeFile and Block
(stripped) inside of the NameNode. Fundamentally BlockGroup is also a
BlockCollection. We do not need to assign generation stamp to BlockGroup (and
even its id can be omitted). What we need is only maintaining the mapping
between block and blockgroup in the original blocksmap, recording the list of
blocks in the blockgroup, and recording the blockgroups in INodeFile. This can
be achieved by maybe slightly extending the BlockInfo and playing with the
BlockCollection interface.
I think in this way we can simplify the current design and reuse most of the
current block management code. To expedite the development and review, maybe we
can use this jira to just focus on the definition of the blockgroup, the
mapping between blocks and blockgroups, and the association between blockgroups
and files (i.e., blockgroup related changes inside of NN). We can develop the
{{addBlockgroup}} API and revisit how to handle under construction block and
blockgroups (and whether we need to assign complete/commit state to block group
and define BlockGroupUC) later in separate jiras.
> Allocating and persisting block groups in NameNode
> --------------------------------------------------
>
> Key: HDFS-7339
> URL: https://issues.apache.org/jira/browse/HDFS-7339
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Zhe Zhang
> Assignee: Zhe Zhang
> Attachments: HDFS-7339-001.patch, HDFS-7339-002.patch,
> HDFS-7339-003.patch, HDFS-7339-004.patch, HDFS-7339-005.patch,
> HDFS-7339-006.patch, Meta-striping.jpg, NN-stripping.jpg
>
>
> All erasure codec operations center around the concept of _block group_; they
> are formed in initial encoding and looked up in recoveries and conversions. A
> lightweight class {{BlockGroup}} is created to record the original and parity
> blocks in a coding group, as well as a pointer to the codec schema (pluggable
> codec schemas will be supported in HDFS-7337). With the striping layout, the
> HDFS client needs to operate on all blocks in a {{BlockGroup}} concurrently.
> Therefore we propose to extend a file’s inode to switch between _contiguous_
> and _striping_ modes, with the current mode recorded in a binary flag. An
> array of BlockGroups (or BlockGroup IDs) is added, which remains empty for
> “traditional” HDFS files with contiguous block layout.
> The NameNode creates and maintains {{BlockGroup}} instances through the new
> {{ECManager}} component; the attached figure has an illustration of the
> architecture. As a simple example, when a {_Striping+EC_} file is created and
> written to, it will serve requests from the client to allocate new
> {{BlockGroups}} and store them under the {{INodeFile}}. In the current phase,
> {{BlockGroups}} are allocated both in initial online encoding and in the
> conversion from replication to EC. {{ECManager}} also facilitates the lookup
> of {{BlockGroup}} information for block recovery work.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)