[jira] [Commented] (HADOOP-11847) Enhance raw coder allowing to read least required inputs in decoding

Kai Zheng (JIRA) Tue, 19 May 2015 06:13:33 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-11847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550403#comment-14550403
 ]


Kai Zheng commented on HADOOP-11847:
------------------------------------

Thanks [~hitliuyi] for the insightful review and good comments. The patch was 
updated accordingly:
* As proposed above extra input buffers for erased or not-to-read units are 
avoided, thus simplified much bit, by modifying a little bit of {{GF}} codes. 
The output buffers related remains as still needed.
* Re-sorted unit test cases to cover the considerations Yi mentioned above, 
also adding the case of erasing too many.

About some comments:
bq. Rename findGoodInput to getFirstNotNullInput, and we can use generic type 
of Java, also the implementation can be simplified
The generic version looks great and I used it, though with a slight different 
name {{findFirstValidInput}}.
bq. We can accept the int erasedNum parameter, then we can allocate the exact 
array size and no need array copy.
No {{erasedNum}} can be passed to as calculating of the value is exactly the 
task of the function {{getErasedOrNotToReadIndexes}}. Note it returns indexes 
not only for erased ones passed from above caller, but also not-to-read ones 
indicated as null.

> Enhance raw coder allowing to read least required inputs in decoding
> --------------------------------------------------------------------
>
>                 Key: HADOOP-11847
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11847
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: io
>            Reporter: Kai Zheng
>            Assignee: Kai Zheng
>              Labels: BB2015-05-TBR
>         Attachments: HADOOP-11847-HDFS-7285-v3.patch, 
> HADOOP-11847-HDFS-7285-v4.patch, HADOOP-11847-HDFS-7285-v5.patch, 
> HADOOP-11847-v1.patch, HADOOP-11847-v2.patch, HADOOP-11847-v6.patch
>
>
> This is to enhance raw erasure coder to allow only reading least required 
> inputs while decoding. It will also refine and document the relevant APIs for 
> better understanding and usage. When using least required inputs, it may add 
> computating overhead but will possiblly outperform overall since less network 
> traffic and disk IO are involved.
> This is something planned to do but just got reminded by [~zhz]' s question 
> raised in HDFS-7678, also copied here:
> bq.Kai Zheng I have a question about decoding: in a (6+3) schema, if block #2 
> is missing, and I want to repair it with blocks 0, 1, 3, 4, 5, 8, how should 
> I construct the inputs to RawErasureDecoder#decode?
> With this work, hopefully the answer to above question would be obvious.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11847) Enhance raw coder allowing to read least required inputs in decoding

Reply via email to