[jira] [Commented] (HADOOP-12041) Implement another Reed-Solomon coder in pure Java

Kai Zheng (JIRA) Fri, 15 Jan 2016 00:29:57 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-12041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101434#comment-15101434
 ]


Kai Zheng commented on HADOOP-12041:
------------------------------------

Thanks [~walter.k.su] for the nice comments!

bq. inited is initially false, and 2 threads may reach line 157 at the same 
time, then both goto line 161.
Ah you're right. I got it. Even though it won't hurt to do the init twice, it's 
better to avoid it. How about making the init() call in a static block in 
{{GF256}}? You know when I wrote the codes I didn't want to be complicated. :)

bq. HDFS-RAID is no longer in latest release ...
Yeah actually the related codes were in the rather old history. But I do know 
some companies still using the old coder (or their new ones originated from 
it). {{DistCp}} is a good option for them. I may consider too much when 
thinking about some situations the coder may be used out of HDFS, note the 
coder/codec framework resides in hadoop common side and it potentially can be 
used in other contexts. Another reason we might still need the codes is, some 
new codec/coder bases on the related codes, like the HitchHicker one 
[~jack_liuquan] is implementing in HADOOP-11828. On the other hand, the old 
coder is hard to maintain to align with the new Java coder and ISA-L coder, I 
thought eventually we would better to get it rid of as you said when assured. 
[~zhz] mentioned it can be marked as _legacy_ is also an option.

> Implement another Reed-Solomon coder in pure Java
> -------------------------------------------------
>
>                 Key: HADOOP-12041
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12041
>             Project: Hadoop Common
>          Issue Type: Sub-task
>            Reporter: Kai Zheng
>            Assignee: Kai Zheng
>         Attachments: HADOOP-12041-v1.patch, HADOOP-12041-v2.patch, 
> HADOOP-12041-v3.patch, HADOOP-12041-v4.patch, HADOOP-12041-v5.patch
>
>
> Currently existing Java RS coders based on {{GaloisField}} implementation 
> have some drawbacks or limitations:
> * The decoder computes not really erased units unnecessarily (HADOOP-11871);
> * The decoder requires parity units + data units order for the inputs in the 
> decode API (HADOOP-12040);
> * Need to support or align with native erasure coders regarding concrete 
> coding algorithms and matrix, so Java coders and native coders can be easily 
> swapped in/out and transparent to HDFS (HADOOP-12010);
> * It's unnecessarily flexible but incurs some overhead, as HDFS erasure 
> coding is totally a byte based data system, we don't need to consider other 
> symbol size instead of 256.
> This desires to implement another  RS coder in pure Java, in addition to the 
> existing {{GaliosField}} from HDFS-RAID. The new Java RS coder will be 
> favored and used by default to resolve the related issues. The old HDFS-RAID 
> originated coder will still be there for comparing, and converting old data 
> from HDFS-RAID systems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-12041) Implement another Reed-Solomon coder in pure Java

Reply via email to