[ 
https://issues.apache.org/jira/browse/HADOOP-13061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15567257#comment-15567257
 ] 

Kai Zheng commented on HADOOP-13061:
------------------------------------

Hi [~andrew.wang],

Thanks for your thoughts! Yes the legacy coder is from HDFS-RAID implementation 
and we have it when starting with our HDFS-EC development at the very 
beginning. It's a good question to ask whether we should keep and maintain the 
coder or not. In the past I have discussed with [~zhz] for some times and my 
preference would be to keep the coder and make the codec work if it wouldn't 
involve too much overhead, for some reasons like: 1) having the coder surely 
doesn't mean we can migrate the HDFS-RAID file system data directly but it's 
possible with some quick-written tools using the coder. The coder logic doesn't 
couple with HDFS specific (either HDFS-RAID blocks or HDFS-EC strip) and what 
it can do is to encode/decode a group of input buffers (and thus a group of 
blocks if repeatedly called). 2) for performance comparison. AFAIK HDFS-RAID 
wasn't rare to be mentioned/discussed when talking about HDFS erasure coding 
things. 3) it'd be a good sample to illustrate that even for the most often 
mentioned RS algorithm, it's good to have different implementation and codecs 
for it. 4) if we don't want to use it in HDFS side, it's ok because all the 
coder/codec logics are in Hadoop common side. I'm wondering if it's good to 
consider that, Hadoop erasure coder/codec framework can develop independently 
and be used elsewhere.

When I said we implement a new erasure codec for rs-legacy, it doesn't mean a 
lots of work since we already have the underlying raw coder implementations. It 
means to be consistent as we did for the xor, rs-default and hhxor codecs. The 
codec doesn't have to be used by HDFS or we can ignore it in HDFS side at all.

Sound good? Thanks.

> Refactor erasure coders
> -----------------------
>
>                 Key: HADOOP-13061
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13061
>             Project: Hadoop Common
>          Issue Type: Sub-task
>            Reporter: Rui Li
>            Assignee: Kai Sasaki
>              Labels: hdfs-ec-3.0-must-do
>         Attachments: HADOOP-13061.01.patch, HADOOP-13061.02.patch, 
> HADOOP-13061.03.patch, HADOOP-13061.04.patch, HADOOP-13061.05.patch, 
> HADOOP-13061.06.patch, HADOOP-13061.07.patch, HADOOP-13061.08.patch, 
> HADOOP-13061.09.patch, HADOOP-13061.10.patch, HADOOP-13061.11.patch, 
> HADOOP-13061.12.patch, HADOOP-13061.13.patch, HADOOP-13061.14.patch, 
> HADOOP-13061.15.patch, HADOOP-13061.16.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to