[jira] [Commented] (HADOOP-12041) Implement another Reed-Solomon coder in pure Java

Kai Zheng (JIRA) Wed, 06 Jan 2016 18:57:47 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-12041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086713#comment-15086713
 ]


Kai Zheng commented on HADOOP-12041:
------------------------------------

Thanks Zhe for the good questions.
bq. The old and new coders should generate the same results, right?
Unfortunately not. That's why I would propose and work on another pure Java 
coder here. Even both so called {{Reed-Solomon}} coder, the HDFS-RAID one and 
ISA-L one uses different coding forms internally. Both use GF256 as the 
targeted HDFS is a byte-based data system. The existing GaliosField facility 
used by HDFS-RAID also favours other symbol size than 256 but as this isn't 
needed in fact, so the new GF256 facility is much simplified. This new Java 
coder is developed to be compatible with the ISA-L coder in case native library 
isn't available in both development and experimental environment. The HDFS-RAID 
one isn't compatible but can be used to port existing data from legacy system 
in case it's needed.

bq. we should rename the existing coder as RSRawEncoderLegacy} and name the new 
one as RSRawEncoder
Excellent idea! Thanks.

bq. Some unused methods: genReedSolomonMatrix, gfBase, gfLogBase
{{GF256}} serves as a complete GF basic facility class I would suggest we keep 
them even unused for now. {{genReedSolomonMatrix}} will be needed because 
people may want to support that coding matrix generation in the algorithm.

Look forward to your more review comments. :)

> Implement another Reed-Solomon coder in pure Java
> -------------------------------------------------
>
>                 Key: HADOOP-12041
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12041
>             Project: Hadoop Common
>          Issue Type: Sub-task
>            Reporter: Kai Zheng
>            Assignee: Kai Zheng
>         Attachments: HADOOP-12041-v1.patch, HADOOP-12041-v2.patch, 
> HADOOP-12041-v3.patch, HADOOP-12041-v4.patch, HADOOP-12041-v5.patch
>
>
> Currently existing Java RS coders based on {{GaloisField}} implementation 
> have some drawbacks or limitations:
> * The decoder computes not really erased units unnecessarily (HADOOP-11871);
> * The decoder requires parity units + data units order for the inputs in the 
> decode API (HADOOP-12040);
> * Need to support or align with native erasure coders regarding concrete 
> coding algorithms and matrix, so Java coders and native coders can be easily 
> swapped in/out and transparent to HDFS (HADOOP-12010);
> * It's unnecessarily flexible but incurs some overhead, as HDFS erasure 
> coding is totally a byte based data system, we don't need to consider other 
> symbol size instead of 256.
> This desires to implement another  RS coder in pure Java, in addition to the 
> existing {{GaliosField}} from HDFS-RAID. The new Java RS coder will be 
> favored and used by default to resolve the related issues. The old HDFS-RAID 
> originated coder will still be there for comparing, and converting old data 
> from HDFS-RAID systems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-12041) Implement another Reed-Solomon coder in pure Java

Reply via email to