[
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14534164#comment-14534164
]
jack liuquan commented on HADOOP-11828:
---------------------------------------
bq.As both xor raw coder and rs raw coder are common to erasure coders for RS
and HH, please extract common codes resolving the duplicates to abstract class,
regarding creating xor and rs raw coder.
bq.We may need abstract class like HHErasureDecodingStep and
HHErasureEncodingStep for the three derivations of the HH algorithm. Classes
like HHXORErasureDecodingStep can inherit from them.
bq.Please try to reuse codes between the two versions of coding: byte[] version
and ByteBuffer version. You may look at the patch in HADOOP-11847 for some idea.
Shall we mark these tips and deal with them in next iterative development?
I think these optimization tips are not the last tips, if we fit one as soon as
we find one, maybe we will do some temporary work and the development
efficiency will be low.
Maybe we can plan a next development iterative stage and cover all these tips
in next stage.
What you think?
bq.We might not override testCoding and performCodingStep in
TestHHErasureCoderBase. Any specific for HH here? If we have to, then there
would be problem to use the coder as it's not general to use.
HH is specific in preparing input data in decoding. e.g. in (k=10, r=4),
Current testCoding()in {{TestErasureCoderBase}} using left 9 data units + 4
parity units to reconstruct the missing one data unit.
But it is not good for HH cause the advantage of HH is to saving requring data
units when reconstructing.
For performCodingStep(), the reason is the use of sub-strip pair.
bq.Is it possible to avoid the cloning input data in getPiggyBacksFromInput?
I have no good idea cause encoding of RS will erasure input data. Could you
give me some good suggestions?
bq.Some comments might be better to reorganized to make them look better. Some
are too long, and some can be longer.
bq.Please note lines should not exceed 80 chars. You could set the width limit
in your IDE.
bq.We need Javadocs for the public functions in HHUtil.
bq.I thought we don't need this test as it's the configuration isn't specific
to the coder.
These are OK to me, I will try my best to treat them with your suggestion.
> Implement the Hitchhiker erasure coding algorithm
> -------------------------------------------------
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
> Issue Type: Sub-task
> Reporter: Zhe Zhang
> Assignee: jack liuquan
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch,
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch,
> HADOOP-11828-hitchhikerXOR-V4.patch, HDFS-7715-hhxor-decoder.patch,
> HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker |
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is
> a new erasure coding algorithm developed as a research project at UC
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45%
> during data reconstruction. This JIRA aims to introduce Hitchhiker to the
> HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)