[
https://issues.apache.org/jira/browse/HADOOP-13061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15567257#comment-15567257
]
Kai Zheng commented on HADOOP-13061:
------------------------------------
Hi [~andrew.wang],
Thanks for your thoughts! Yes the legacy coder is from HDFS-RAID implementation
and we have it when starting with our HDFS-EC development at the very
beginning. It's a good question to ask whether we should keep and maintain the
coder or not. In the past I have discussed with [~zhz] for some times and my
preference would be to keep the coder and make the codec work if it wouldn't
involve too much overhead, for some reasons like: 1) having the coder surely
doesn't mean we can migrate the HDFS-RAID file system data directly but it's
possible with some quick-written tools using the coder. The coder logic doesn't
couple with HDFS specific (either HDFS-RAID blocks or HDFS-EC strip) and what
it can do is to encode/decode a group of input buffers (and thus a group of
blocks if repeatedly called). 2) for performance comparison. AFAIK HDFS-RAID
wasn't rare to be mentioned/discussed when talking about HDFS erasure coding
things. 3) it'd be a good sample to illustrate that even for the most often
mentioned RS algorithm, it's good to have different implementation and codecs
for it. 4) if we don't want to use it in HDFS side, it's ok because all the
coder/codec logics are in Hadoop common side. I'm wondering if it's good to
consider that, Hadoop erasure coder/codec framework can develop independently
and be used elsewhere.
When I said we implement a new erasure codec for rs-legacy, it doesn't mean a
lots of work since we already have the underlying raw coder implementations. It
means to be consistent as we did for the xor, rs-default and hhxor codecs. The
codec doesn't have to be used by HDFS or we can ignore it in HDFS side at all.
Sound good? Thanks.
> Refactor erasure coders
> -----------------------
>
> Key: HADOOP-13061
> URL: https://issues.apache.org/jira/browse/HADOOP-13061
> Project: Hadoop Common
> Issue Type: Sub-task
> Reporter: Rui Li
> Assignee: Kai Sasaki
> Labels: hdfs-ec-3.0-must-do
> Attachments: HADOOP-13061.01.patch, HADOOP-13061.02.patch,
> HADOOP-13061.03.patch, HADOOP-13061.04.patch, HADOOP-13061.05.patch,
> HADOOP-13061.06.patch, HADOOP-13061.07.patch, HADOOP-13061.08.patch,
> HADOOP-13061.09.patch, HADOOP-13061.10.patch, HADOOP-13061.11.patch,
> HADOOP-13061.12.patch, HADOOP-13061.13.patch, HADOOP-13061.14.patch,
> HADOOP-13061.15.patch, HADOOP-13061.16.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]