[
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15069031#comment-15069031
]
Rashmi Vinayak commented on HADOOP-11828:
-----------------------------------------
Hi [~jack_liuquan],
Thanks for the great work! I went through the code very carefully for the
algorithm review. Everything looks fine in terms of correctness.
Few comments:
1. The name ‘doDecodeMulti’ for the method in HHXORErasureDecodingStep is
slightly confusing since it handles both the case of multiple erasures and as
well single parity erasure. Perhaps something on the lines of
‘doDecodeMultiAndParity’ might reflect the actions of this method more
accurately?
2. It seems that there is no need to pass ‘erasedIndexes’ as input to the
methods in HHXORErasureDecodingStep class since it is a class variable? (You
might have used these additional inputs for clarity; I just thought of bringing
this to your attention.)
3. On a minor side, I think it would be helpful for future readers to include a
reference to the paper in case they want to understand the algorithm. What do
you think? (We can have something on the lines: “A "Hitchhiker's" Guide to Fast
and Efficient Data Reconstruction in Erasure-coded Data Centers”, in ACM
SIGCOMM 2014.). Also, just to make the context completely clear, could you
please change the description in the comments to “It has been shown to reduce
network traffic and disk I/O by 25%-45% during data reconstruction while
retaining the same storage capacity and failure tolerance capability of RS
codes.” (last phrase is added to the existing comment).
Thanks,
Rashmi
> Implement the Hitchhiker erasure coding algorithm
> -------------------------------------------------
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
> Issue Type: Sub-task
> Reporter: Zhe Zhang
> Assignee: jack liuquan
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch,
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch,
> HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch,
> HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker |
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is
> a new erasure coding algorithm developed as a research project at UC
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45%
> during data reconstruction while retaining the same storage capacity and
> failure tolerance capability as RS codes. This JIRA aims to introduce
> Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)