[jira] [Commented] (HDFS-7337) Configurable and pluggable Erasure Codec and schema

Kai Zheng (JIRA) Fri, 09 Jan 2015 14:47:03 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14272010#comment-14272010
 ]


Kai Zheng commented on HDFS-7337:
---------------------------------

Still continued to address Zhe's above comments.
bq. I guess ECBlock is for testing purpose? An erasure coded block should have 
all properties of a regular block. I think we can just add a couple of flags to 
the Block class.
Somehow you're right the ECBlock class isn't finalized as the whole bundle of 
codes were attached for our looking at and discussing. I'm working on this and 
will decouple ECBlock from HDFS block. It's possible because the codec 
framework has already nice arrangement to delegate how to pull/extract chunks 
(ECChunk) from ECBlock. It's the caller's (ECWorker or ECClient) responsibility 
to handle how to extract/collect bytes chunks from an actual HDFS block. When 
decoupled, the ECBlock or similar would be very lightweight and won't need so 
many fields at all. I will have new codes for us discussion further.
bq. It's not quite clear to me why we need ErasureCoderCallback. Is it for 
async codec calculation? If codec calculations are done on small packets, I 
think sync operations are fine.
The ErasureCoderCallback maybe better named to avoid such confusion. It's not 
relevant to sync or async. It's basically for the codec caller (ECWorker or 
ECClient) to handle how to get chunks from blocks. Codec will call it to pull 
chunks from blocks. It can be regarded as data sources provider. In ECWorker in 
transforming case, many chunks can be pulled from the transformed blocks, thus 
the enclosed bytes level encode() or decode() in raw coder can be called in 
many places in a while loop. In ECClient in stripping ec case, it's similar 
until the application finishes to write/read data from a BlockGroup.

> Configurable and pluggable Erasure Codec and schema
> ---------------------------------------------------
>
>                 Key: HDFS-7337
>                 URL: https://issues.apache.org/jira/browse/HDFS-7337
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Zhe Zhang
>            Assignee: Kai Zheng
>         Attachments: HDFS-7337-prototype-v1.patch, 
> HDFS-7337-prototype-v2.zip, HDFS-7337-prototype-v3.zip, 
> PluggableErasureCodec.pdf
>
>
> According to HDFS-7285 and the design, this considers to support multiple 
> Erasure Codecs via pluggable approach. It allows to define and configure 
> multiple codec schemas with different coding algorithms and parameters. The 
> resultant codec schemas can be utilized and specified via command tool for 
> different file folders. While design and implement such pluggable framework, 
> it’s also to implement a concrete codec by default (Reed Solomon) to prove 
> the framework is useful and workable. Separate JIRA could be opened for the 
> RS codec implementation.
> Note HDFS-7353 will focus on the very low level codec API and implementation 
> to make concrete vendor libraries transparent to the upper layer. This JIRA 
> focuses on high level stuffs that interact with configuration, schema and etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7337) Configurable and pluggable Erasure Codec and schema

Reply via email to