[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15069031#comment-15069031 ]
Rashmi Vinayak commented on HADOOP-11828: ----------------------------------------- Hi [~jack_liuquan], Thanks for the great work! I went through the code very carefully for the algorithm review. Everything looks fine in terms of correctness. Few comments: 1. The name ‘doDecodeMulti’ for the method in HHXORErasureDecodingStep is slightly confusing since it handles both the case of multiple erasures and as well single parity erasure. Perhaps something on the lines of ‘doDecodeMultiAndParity’ might reflect the actions of this method more accurately? 2. It seems that there is no need to pass ‘erasedIndexes’ as input to the methods in HHXORErasureDecodingStep class since it is a class variable? (You might have used these additional inputs for clarity; I just thought of bringing this to your attention.) 3. On a minor side, I think it would be helpful for future readers to include a reference to the paper in case they want to understand the algorithm. What do you think? (We can have something on the lines: “A "Hitchhiker's" Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers”, in ACM SIGCOMM 2014.). Also, just to make the context completely clear, could you please change the description in the comments to “It has been shown to reduce network traffic and disk I/O by 25%-45% during data reconstruction while retaining the same storage capacity and failure tolerance capability of RS codes.” (last phrase is added to the existing comment). Thanks, Rashmi > Implement the Hitchhiker erasure coding algorithm > ------------------------------------------------- > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task > Reporter: Zhe Zhang > Assignee: jack liuquan > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, > HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction while retaining the same storage capacity and > failure tolerance capability as RS codes. This JIRA aims to introduce > Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)