[
https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199444#comment-14199444
]
Zhe Zhang commented on HDFS-7339:
---------------------------------
The unit test in the patch will be replaced by the one in HDFS-7347 when it's
committed. I've also based the patch off trunk rather than HDFS-EC to get a
Jenkins run. I'll rebase after the patch is reviewed.
> Create block groups for initial block encoding
> ----------------------------------------------
>
> Key: HDFS-7339
> URL: https://issues.apache.org/jira/browse/HDFS-7339
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Zhe Zhang
> Assignee: Zhe Zhang
> Attachments: Encoding-design-NN.jpg, HDFS-7339-001.patch
>
>
> All erasure codec operations center around the concept of _block groups_,
> which are formed in encoding and looked up in decoding. This JIRA creates a
> lightweight {{BlockGroup}} class to record the original and parity blocks in
> an encoding group, as well as a pointer to the codec schema. Pluggable codec
> schemas will be supported in HDFS-7337.
> The NameNode creates and maintains {{BlockGroup}} instances through 2 new
> components; the attached figure has an illustration of the architecture.
> {{ECManager}}: This module manages {{BlockGroups}} and associated codec
> schemas. As a simple example, it stores the codec schema of Reed-Solomon
> algorithm with 3 original and 2 parity blocks (5 blocks in each group). Each
> {{BlockGroup}} points to the schema it uses. To facilitate lookups during
> recovery requests, {{BlockGroups}} should be oraganized as a map keyed by
> {{Blocks}}.
> {{ErasureCodingBlocks}}: Block encoding work is triggered by multiple events.
> This module analyzes the incoming events, and dispatches tasks to
> {{UnderReplicatedBlocks}} to create parity blocks. A new queue
> ({{QUEUE_INITIAL_ENCODING}}) will be added to the 5 existing priority queues
> to maintain the relative order of encoding and replication tasks.
> * Whenever a block is finalized and meets EC criteria -- including 1) block
> size is full; 2) the file’s storage policy allows EC --
> {{ErasureCodingBlocks}} tries to form a {{BlockGroup}}. In order to do so it
> needs to store a set of blocks waiting to be encoded. Different grouping
> algorithms can be applied -- e.g., always grouping blocks in the same file.
> Blocks in a group should also reside on different DataNodes, and ideally on
> different racks, to tolerate node and rack failures. If successful, it
> records the formed group with {{ECManager}} and insert the parity blocks into
> {{QUEUE_INITIAL_ENCODING}}.
> * When a parity block or a raw block in {{ENCODED}} state is found missing,
> {{ErasureCodingBlocks}} adds it to existing priority queues in
> {{UnderReplicatedBlocks}}. E.g., if all parity blocks in a group are lost,
> they should be added to {{QUEUE_HIGHEST_PRIORITY}}. New priorities might be
> added for fine grained differentiation (e.g., loss of a raw block versus a
> parity one).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)