[ 
https://issues.apache.org/jira/browse/HDFS-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14266816#comment-14266816
 ] 

Andrew Wang commented on HDFS-7337:
-----------------------------------

Hey Kai, thanks for getting us started here. I gave this a quick look, had a 
few comments:

* Could you generate normal plaintext diffs rather than a zip? We might also 
want to reorganize things into existing packages. The rawcoder stuff could go 
somewhere in hadoop-common for instance. We could move the block grouper 
classes into blockmanagement. etc.
* I see mixed tabs and spaces, we do spaces only in Hadoop.
* Since the LRC stuff is still up in the air, could we defer everything related 
to that to a later JIRA?
* In RSBlockGrouper, using ExtendedBlockId is overkill, since the bpid is the 
same for everything

Configuration
* The XML file approach seems potentially error-prone. IIUC after a set of 
parameters are assigned to a schema name, the parameters should never be 
changed. We thus also need to keep the xml file in sync between the NN, DN, and 
client. The client part is especially troublesome. Are we planning to put into 
the editlog/image down the road, like how we do storage policies?
* Also, I think we want to separate out the the type of erasure coding from the 
implementation. The schema definition from the PDF encodes both together, e.g. 
JerasureRS. While it's not possible to change the RS part, the user might want 
to swap out Jerasure for ISAL which should be allowed. This is sort of like how 
we did things for encryption; we define a CipherSuite (i.e. AES-CTR) and then 
the user can choose among the multiple pluggable implementations for that 
cipher.

BlockGroup:
* Zhe told me this is a placeholder class, but a few comments nonetheless.
* Can we just set the two fields in the constructor? They should also be final.
* Since the schema encodes the layout, does SubBlockGroup need to encode both 
data and parity? Do we even need SubBlockGroup? Seems like a single array and a 
schema (a concrete object, which also encodes the RS or LRC parameters) tells 
you the layout, which is sufficient. This will save some memory.

> Configurable and pluggable Erasure Codec and schema
> ---------------------------------------------------
>
>                 Key: HDFS-7337
>                 URL: https://issues.apache.org/jira/browse/HDFS-7337
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Zhe Zhang
>            Assignee: Kai Zheng
>         Attachments: HDFS-7337-prototype-v1.patch, 
> HDFS-7337-prototype-v2.zip, HDFS-7337-prototype-v3.zip, 
> PluggableErasureCodec.pdf
>
>
> According to HDFS-7285 and the design, this considers to support multiple 
> Erasure Codecs via pluggable approach. It allows to define and configure 
> multiple codec schemas with different coding algorithms and parameters. The 
> resultant codec schemas can be utilized and specified via command tool for 
> different file folders. While design and implement such pluggable framework, 
> it’s also to implement a concrete codec by default (Reed Solomon) to prove 
> the framework is useful and workable. Separate JIRA could be opened for the 
> RS codec implementation.
> Note HDFS-7353 will focus on the very low level codec API and implementation 
> to make concrete vendor libraries transparent to the upper layer. This JIRA 
> focuses on high level stuffs that interact with configuration, schema and etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to