[
https://issues.apache.org/jira/browse/HADOOP-12041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127239#comment-15127239
]
Zhe Zhang commented on HADOOP-12041:
------------------------------------
Thanks for the great work Kai. Patch LGTM overall. A few minor issues:
# {{makeValidIndexes}} should be static and maybe moved to {{CodecUtil}}
# The reported {{checkstyle}} issue looks valid.
# When you say the new coder is compatible with ISA-L coder, you mean I can use
new Java coder to decode data encoded with ISA-L, right?
# {{gftbls}} is hard to understand by name, does it mean {{gfTables}}?
# Any reason to allocate {{encodeMatrix}}, {{decodeMatrix}}, and
{{invertMatrix}} as 1D arrays but use them as matrices? Can we use 2D arrays?
# The below is not easy to understand. Why don't we need to prepare
{{decodeMatrix}} if the cached indexes haven't changed?
{code}
if (Arrays.equals(this.cachedErasedIndexes, erasedIndexes) &&
Arrays.equals(this.validIndexes, tmpValidIndexes)) {
return; // Optimization. Nothing to do
}
{code}
# {{RSUtil2#genReedSolomonMatrix}} is unused
# Patch is already very large, I think we should add {{package-info}}
separately.
# So about the incompatibility between HDFS-RAID coder and the new Java coder:
is it because they use different GF matrices?
> Implement another Reed-Solomon coder in pure Java
> -------------------------------------------------
>
> Key: HADOOP-12041
> URL: https://issues.apache.org/jira/browse/HADOOP-12041
> Project: Hadoop Common
> Issue Type: Sub-task
> Reporter: Kai Zheng
> Assignee: Kai Zheng
> Attachments: HADOOP-12041-v1.patch, HADOOP-12041-v2.patch,
> HADOOP-12041-v3.patch, HADOOP-12041-v4.patch, HADOOP-12041-v5.patch,
> HADOOP-12041-v6.patch
>
>
> Currently existing Java RS coders based on {{GaloisField}} implementation
> have some drawbacks or limitations:
> * The decoder computes not really erased units unnecessarily (HADOOP-11871);
> * The decoder requires parity units + data units order for the inputs in the
> decode API (HADOOP-12040);
> * Need to support or align with native erasure coders regarding concrete
> coding algorithms and matrix, so Java coders and native coders can be easily
> swapped in/out and transparent to HDFS (HADOOP-12010);
> * It's unnecessarily flexible but incurs some overhead, as HDFS erasure
> coding is totally a byte based data system, we don't need to consider other
> symbol size instead of 256.
> This desires to implement another RS coder in pure Java, in addition to the
> existing {{GaliosField}} from HDFS-RAID. The new Java RS coder will be
> favored and used by default to resolve the related issues. The old HDFS-RAID
> originated coder will still be there for comparing, and converting old data
> from HDFS-RAID systems.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)