[ 
https://issues.apache.org/jira/browse/HDFS-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14260441#comment-14260441
 ] 

Zhe Zhang commented on HDFS-7337:
---------------------------------

Great work [~drankye] ! I went over the design and have the following comments:
# I like the idea of creating an {{ec}} package under 
{{org.apache.hadoop.hdfs}}. It is a good place to host all codec classes.
# I think the {{ec}} package should focus on codec calculation based on a 
packet unit. Below is how I think the functions should be logically divided:
#* The {{ErasureCodec}} interface simply provide encode and decode functions 
that take a {{byte[][]}} and produce another {{byte[][]}}. It should be 
*unaware* of blocks. For example, I imagine our encode function should look 
similar to Jerasure's 
(https://github.com/tsuraan/Jerasure/blob/master/Manual.pdf): 
{code} void jerasure matrix encode(k, m, w, matrix, data_ptrs, coding_ptrs, 
size) {code}
#* {{BlockGroups}} should be formed by {{ECManager}}. In doing so it calls the 
encode and decode functions from {{ErasureCodec}}
# Logically, {{BlockGroup}} is applicable even without EC, because striping can 
be done without EC. So an alternative is to put it in the {{protocol}} package.
# I don't think we should reference the schema through a name (since it wastes 
space and is fragile). We should look at other configurable policies (e.g., 
block placement algorithm) and see how they are loaded. IIRC a factory class is 
used.
# It's great that we are considering LRC in advance. However, with LEGAL-211 
pending, I suggest we keep {{BlockGroup}} simpler for now. For example, it can 
contain only {{dataBlocks}} and {{parityBlocks}}. When we implement LRC we can 
subclass or extend it.
# I guess {{ECBlock}} is for testing purpose? An erasure coded block should 
have all properties of a regular block. I think we can just add a couple of 
flags to the {{Block}} class.
# It's not quite clear to me why we need {{ErasureCoderCallback}}. Is it for 
async codec calculation? If codec calculations are done on small packets, I 
think sync operations are fine.

Thanks!

> Configurable and pluggable Erasure Codec and schema
> ---------------------------------------------------
>
>                 Key: HDFS-7337
>                 URL: https://issues.apache.org/jira/browse/HDFS-7337
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Zhe Zhang
>            Assignee: Kai Zheng
>         Attachments: HDFS-7337-prototype-v1.patch, 
> HDFS-7337-prototype-v2.zip, HDFS-7337-prototype-v3.zip, 
> PluggableErasureCodec.pdf
>
>
> According to HDFS-7285 and the design, this considers to support multiple 
> Erasure Codecs via pluggable approach. It allows to define and configure 
> multiple codec schemas with different coding algorithms and parameters. The 
> resultant codec schemas can be utilized and specified via command tool for 
> different file folders. While design and implement such pluggable framework, 
> it’s also to implement a concrete codec by default (Reed Solomon) to prove 
> the framework is useful and workable. Separate JIRA could be opened for the 
> RS codec implementation.
> Note HDFS-7353 will focus on the very low level codec API and implementation 
> to make concrete vendor libraries transparent to the upper layer. This JIRA 
> focuses on high level stuffs that interact with configuration, schema and etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to