[
https://issues.apache.org/jira/browse/HADOOP-12041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086713#comment-15086713
]
Kai Zheng commented on HADOOP-12041:
------------------------------------
Thanks Zhe for the good questions.
bq. The old and new coders should generate the same results, right?
Unfortunately not. That's why I would propose and work on another pure Java
coder here. Even both so called {{Reed-Solomon}} coder, the HDFS-RAID one and
ISA-L one uses different coding forms internally. Both use GF256 as the
targeted HDFS is a byte-based data system. The existing GaliosField facility
used by HDFS-RAID also favours other symbol size than 256 but as this isn't
needed in fact, so the new GF256 facility is much simplified. This new Java
coder is developed to be compatible with the ISA-L coder in case native library
isn't available in both development and experimental environment. The HDFS-RAID
one isn't compatible but can be used to port existing data from legacy system
in case it's needed.
bq. we should rename the existing coder as RSRawEncoderLegacy} and name the new
one as RSRawEncoder
Excellent idea! Thanks.
bq. Some unused methods: genReedSolomonMatrix, gfBase, gfLogBase
{{GF256}} serves as a complete GF basic facility class I would suggest we keep
them even unused for now. {{genReedSolomonMatrix}} will be needed because
people may want to support that coding matrix generation in the algorithm.
Look forward to your more review comments. :)
> Implement another Reed-Solomon coder in pure Java
> -------------------------------------------------
>
> Key: HADOOP-12041
> URL: https://issues.apache.org/jira/browse/HADOOP-12041
> Project: Hadoop Common
> Issue Type: Sub-task
> Reporter: Kai Zheng
> Assignee: Kai Zheng
> Attachments: HADOOP-12041-v1.patch, HADOOP-12041-v2.patch,
> HADOOP-12041-v3.patch, HADOOP-12041-v4.patch, HADOOP-12041-v5.patch
>
>
> Currently existing Java RS coders based on {{GaloisField}} implementation
> have some drawbacks or limitations:
> * The decoder computes not really erased units unnecessarily (HADOOP-11871);
> * The decoder requires parity units + data units order for the inputs in the
> decode API (HADOOP-12040);
> * Need to support or align with native erasure coders regarding concrete
> coding algorithms and matrix, so Java coders and native coders can be easily
> swapped in/out and transparent to HDFS (HADOOP-12010);
> * It's unnecessarily flexible but incurs some overhead, as HDFS erasure
> coding is totally a byte based data system, we don't need to consider other
> symbol size instead of 256.
> This desires to implement another RS coder in pure Java, in addition to the
> existing {{GaliosField}} from HDFS-RAID. The new Java RS coder will be
> favored and used by default to resolve the related issues. The old HDFS-RAID
> originated coder will still be there for comparing, and converting old data
> from HDFS-RAID systems.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)