[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15069031#comment-15069031
 ] 

Rashmi Vinayak commented on HADOOP-11828:
-----------------------------------------

Hi [~jack_liuquan],
Thanks for the great work! I went through the code very carefully for the 
algorithm review. Everything looks fine in terms of correctness. 
Few comments:
1. The name ‘doDecodeMulti’ for the method in HHXORErasureDecodingStep is 
slightly confusing since it handles both the case of multiple erasures and as 
well single parity erasure. Perhaps something on the lines of 
‘doDecodeMultiAndParity’ might reflect the actions of this method more 
accurately? 
2. It seems that there is no need to pass ‘erasedIndexes’  as input to the 
methods in HHXORErasureDecodingStep class since it is a class variable? (You 
might have used these additional inputs for clarity; I just thought of bringing 
this to your attention.) 
3. On a minor side, I think it would be helpful for future readers to include a 
reference to the paper in case they want to understand the algorithm. What do 
you think? (We can have something on the lines: “A "Hitchhiker's" Guide to Fast 
and Efficient Data Reconstruction in Erasure-coded Data Centers”, in ACM 
SIGCOMM 2014.). Also, just to make the context completely clear, could you 
please change the description in the comments to “It has been shown to reduce 
network traffic and disk I/O by 25%-45% during data reconstruction while 
retaining the same storage capacity and failure tolerance capability of RS 
codes.”  (last phrase is added to the existing comment).

Thanks,
Rashmi

> Implement the Hitchhiker erasure coding algorithm
> -------------------------------------------------
>
>                 Key: HADOOP-11828
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11828
>             Project: Hadoop Common
>          Issue Type: Sub-task
>            Reporter: Zhe Zhang
>            Assignee: jack liuquan
>         Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, 
> HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction while retaining the same storage capacity and 
> failure tolerance capability as RS codes. This JIRA aims to introduce 
> Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to