[
https://issues.apache.org/jira/browse/HADOOP-11847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550403#comment-14550403
]
Kai Zheng commented on HADOOP-11847:
------------------------------------
Thanks [~hitliuyi] for the insightful review and good comments. The patch was
updated accordingly:
* As proposed above extra input buffers for erased or not-to-read units are
avoided, thus simplified much bit, by modifying a little bit of {{GF}} codes.
The output buffers related remains as still needed.
* Re-sorted unit test cases to cover the considerations Yi mentioned above,
also adding the case of erasing too many.
About some comments:
bq. Rename findGoodInput to getFirstNotNullInput, and we can use generic type
of Java, also the implementation can be simplified
The generic version looks great and I used it, though with a slight different
name {{findFirstValidInput}}.
bq. We can accept the int erasedNum parameter, then we can allocate the exact
array size and no need array copy.
No {{erasedNum}} can be passed to as calculating of the value is exactly the
task of the function {{getErasedOrNotToReadIndexes}}. Note it returns indexes
not only for erased ones passed from above caller, but also not-to-read ones
indicated as null.
> Enhance raw coder allowing to read least required inputs in decoding
> --------------------------------------------------------------------
>
> Key: HADOOP-11847
> URL: https://issues.apache.org/jira/browse/HADOOP-11847
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: io
> Reporter: Kai Zheng
> Assignee: Kai Zheng
> Labels: BB2015-05-TBR
> Attachments: HADOOP-11847-HDFS-7285-v3.patch,
> HADOOP-11847-HDFS-7285-v4.patch, HADOOP-11847-HDFS-7285-v5.patch,
> HADOOP-11847-v1.patch, HADOOP-11847-v2.patch, HADOOP-11847-v6.patch
>
>
> This is to enhance raw erasure coder to allow only reading least required
> inputs while decoding. It will also refine and document the relevant APIs for
> better understanding and usage. When using least required inputs, it may add
> computating overhead but will possiblly outperform overall since less network
> traffic and disk IO are involved.
> This is something planned to do but just got reminded by [~zhz]' s question
> raised in HDFS-7678, also copied here:
> bq.Kai Zheng I have a question about decoding: in a (6+3) schema, if block #2
> is missing, and I want to repair it with blocks 0, 1, 3, 4, 5, 8, how should
> I construct the inputs to RawErasureDecoder#decode?
> With this work, hopefully the answer to above question would be obvious.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)