[jira] [Commented] (HDFS-7337) Configurable and pluggable Erasure Codec and schema

Kai Zheng (JIRA) Fri, 09 Jan 2015 18:59:15 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14272294#comment-14272294
 ]


Kai Zheng commented on HDFS-7337:
---------------------------------

Continued
bq.The XML file approach seems potentially error-prone. ... Are we planning to 
put into the editlog/image down the road, like how we do storage policies?
Yes I agree and will follow the approach you suggested. 
bq.I think we want to separate out the the type of erasure coding from the 
implementation....
Good suggestion. It makes sense to swap among various implementations using 
different erasure coding libraries (Java, ISA and Jerasure) given a certain 
codec. It's easy to allow this since in current codes ErasureCoder and 
RawErasureCoder are separated already and what's needed is just allowing 
changing of RawErasureCoder for an ErasureCodec via configuration.
bq.BlockGroup
As discussed with Zhe and updated in my above comments, I need to hide and not 
expose internal details like SubBlockGroup only interested by codec. I need to 
provide two factory methods or constructors for the two cases of creating a 
BlockGroup: 1) in non-stripping mode, given an array of existing data blocks 
with the blockgroup id; 2) in stripping case, in addition to the blockgroup id, 
no data blocks are needed because they're all new and to be allocated. 
bq.Since the schema encodes the layout,...
In current design in my understanding, schema records configuration globally 
for all files in an ec zone. A BlockGroup object can be regarded as an instance 
of the schema for an inode or file, which records how the blocks in the group 
including data blocks and parity blocks are organized and ordered. In effect, 
one copy of codec specific configuration (k=6,m=3,chunk_size=16mb) in schema + 
amounts of BlockGroup instances are required to persist in fs image/editlog. 
Looks like Zhe has nice consideration about how to persist blockgroups 
efficiently. Not necessarily the relevant fields appeared in BlockGroup will be 
persisted, but can be restored or derived from minimum persistent information. 
Sorry for my confusion if my previous discussions gave you that impression. I 
do need to update the design and codes to clarify all of this. I will catch up 
soon next week. 

> Configurable and pluggable Erasure Codec and schema
> ---------------------------------------------------
>
>                 Key: HDFS-7337
>                 URL: https://issues.apache.org/jira/browse/HDFS-7337
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Zhe Zhang
>            Assignee: Kai Zheng
>         Attachments: HDFS-7337-prototype-v1.patch, 
> HDFS-7337-prototype-v2.zip, HDFS-7337-prototype-v3.zip, 
> PluggableErasureCodec.pdf
>
>
> According to HDFS-7285 and the design, this considers to support multiple 
> Erasure Codecs via pluggable approach. It allows to define and configure 
> multiple codec schemas with different coding algorithms and parameters. The 
> resultant codec schemas can be utilized and specified via command tool for 
> different file folders. While design and implement such pluggable framework, 
> it’s also to implement a concrete codec by default (Reed Solomon) to prove 
> the framework is useful and workable. Separate JIRA could be opened for the 
> RS codec implementation.
> Note HDFS-7353 will focus on the very low level codec API and implementation 
> to make concrete vendor libraries transparent to the upper layer. This JIRA 
> focuses on high level stuffs that interact with configuration, schema and etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7337) Configurable and pluggable Erasure Codec and schema

Reply via email to