[jira] [Commented] (HDFS-7337) Configurable and pluggable Erasure Codec and schema

Zhe Zhang (JIRA) Mon, 12 Jan 2015 17:21:07 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274549#comment-14274549
 ]


Zhe Zhang commented on HDFS-7337:
---------------------------------

Thanks Kai for the deeper discussion.

bq. So the question comes to what aspects would be covered and how they're 
covered when support a new code algorithm: 1) how to calculate with a group of 
bytes, units or chunks, which is covered by ErasureCoder and RawErasureCoder; 
2) how to layout/order the group of chunks, which is covered by BlockGrouper. 
I think this is a good summary of what's included in the current patch. 
Actually I think the patch will be more trackable if we separate these 2 
features. The arithmetic part is primarily used by client/DN (HDFS-7545 and 
HDFS-7344). The _grouper_ component will be used by NN (HDFS-7339). I suggest 
we keep this JIRA for configurable/pluggable arithmetic codec calculation, and 
create another JIRA for configurable/pluggable block layout. This way they can 
be reviewed and committed more quickly.

As Kai also echoed 
[above|https://issues.apache.org/jira/browse/HDFS-7337?focusedCommentId=14271930&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14271930],
  we should first create a working prototype with simplest schema, then add at 
least one other schema, and finally figure out how to abstract out the common 
logic between different schemas.

In my understanding, the simplest working prototype would be a striping client 
(HDFS-7545) asking the NN (HDFS-7339) to allocate and persist block groups, 
using the arithmetic codec provided in this JIRA (HDFS-7337) to calculate 
Reed-Solomon parity data, and successfully writing an EC file. 

In this flow, all NN needs from the _schema_ is the numbers or data and parity 
blocks. I think these 2 numbers can be embedded as XAttr. We should also assume 
a pair of default values which are used in absence of configured XAttrs.

> Configurable and pluggable Erasure Codec and schema
> ---------------------------------------------------
>
>                 Key: HDFS-7337
>                 URL: https://issues.apache.org/jira/browse/HDFS-7337
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Zhe Zhang
>            Assignee: Kai Zheng
>         Attachments: HDFS-7337-prototype-v1.patch, 
> HDFS-7337-prototype-v2.zip, HDFS-7337-prototype-v3.zip, 
> PluggableErasureCodec.pdf
>
>
> According to HDFS-7285 and the design, this considers to support multiple 
> Erasure Codecs via pluggable approach. It allows to define and configure 
> multiple codec schemas with different coding algorithms and parameters. The 
> resultant codec schemas can be utilized and specified via command tool for 
> different file folders. While design and implement such pluggable framework, 
> it’s also to implement a concrete codec by default (Reed Solomon) to prove 
> the framework is useful and workable. Separate JIRA could be opened for the 
> RS codec implementation.
> Note HDFS-7353 will focus on the very low level codec API and implementation 
> to make concrete vendor libraries transparent to the upper layer. This JIRA 
> focuses on high level stuffs that interact with configuration, schema and etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7337) Configurable and pluggable Erasure Codec and schema

Reply via email to