[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

jack liuquan (JIRA) Fri, 08 May 2015 02:07:03 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14534164#comment-14534164
 ]


jack liuquan commented on HADOOP-11828:
---------------------------------------

bq.As both xor raw coder and rs raw coder are common to erasure coders for RS 
and HH, please extract common codes resolving the duplicates to abstract class, 
regarding creating xor and rs raw coder.
bq.We may need abstract class like HHErasureDecodingStep and 
HHErasureEncodingStep for the three derivations of the HH algorithm. Classes 
like HHXORErasureDecodingStep can inherit from them.
bq.Please try to reuse codes between the two versions of coding: byte[] version 
and ByteBuffer version. You may look at the patch in HADOOP-11847 for some idea.
Shall we mark these tips and deal with them in next iterative development?
I think these optimization tips are not the last tips, if we fit one as soon as 
we find one, maybe we will do some temporary work and the development 
efficiency will be low.
Maybe we can plan a next development iterative stage and cover all these tips 
in next stage.
What you think?

bq.We might not override testCoding and performCodingStep in 
TestHHErasureCoderBase. Any specific for HH here? If we have to, then there 
would be problem to use the coder as it's not general to use.
HH is specific in preparing input data in decoding. e.g. in (k=10, r=4), 
Current testCoding()in {{TestErasureCoderBase}} using left 9 data units + 4 
parity units to reconstruct the missing one data unit. 
But it is not good for HH cause the advantage of HH is to saving requring data 
units when reconstructing. 
For performCodingStep(), the reason is the use of sub-strip pair.

bq.Is it possible to avoid the cloning input data in getPiggyBacksFromInput?
I have no good idea cause encoding of RS will erasure input data. Could you 
give me some good suggestions?

bq.Some comments might be better to reorganized to make them look better. Some 
are too long, and some can be longer.
bq.Please note lines should not exceed 80 chars. You could set the width limit 
in your IDE.
bq.We need Javadocs for the public functions in HHUtil.
bq.I thought we don't need this test as it's the configuration isn't specific 
to the coder.
These are OK to me, I will try my best to treat them with your suggestion.

> Implement the Hitchhiker erasure coding algorithm
> -------------------------------------------------
>
>                 Key: HADOOP-11828
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11828
>             Project: Hadoop Common
>          Issue Type: Sub-task
>            Reporter: Zhe Zhang
>            Assignee: jack liuquan
>         Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HDFS-7715-hhxor-decoder.patch, 
> HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction. This JIRA aims to introduce Hitchhiker to the 
> HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

Reply via email to