[
https://issues.apache.org/jira/browse/HADOOP-11847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14530193#comment-14530193
]
Kai Zheng commented on HADOOP-11847:
------------------------------------
Hi [~hitliuyi],
Thanks for your good thoughts about the decoder API. It's refined as below. How
do you like of this? Thanks.
{code}
/**
* Decode with inputs and erasedIndexes, generates outputs.
* How to prepare for inputs:
* 1. Create an array containing parity units + data units;
* 2. Set null in the array locations specified via erasedIndexes to indicate
* they're erased and no data are to read from;
* 3. Set null in the array locations for extra redundant items, as they're
not
* necessary to read when decoding. For example in RS-6-3, if only 1 unit
* is really erased, then we have 2 extra items as redundant. They can be
* set as null to indicate no data will be used from them.
*
* For an example using RS (6, 3), assuming sources (d0, d1, d2, d3, d4, d5)
* and parities (p0, p1, p2), d2 being erased. We can and may want to use only
* 6 units like (d1, d3, d4, d5, p0, p2) to recover d2. We will have:
* inputs = [p0, null(p1), p2, null(d0), d1, null(d2), d3, d4, d5]
* erasedIndexes = [5] // index of d2 into inputs array
* outputs = [a-writable-buffer]
*
* @param inputs inputs to read data from
* @param erasedIndexes indexes of erased units into inputs array
* @param outputs outputs to write into for data generated according to
* erasedIndexes
*/
public void decode(ByteBuffer[] inputs, int[] erasedIndexes, ByteBuffer[]
outputs);
{code}
The impact from the caller's point of view:
It requires to provide input buffers using NULL to indicate not to read or
erased;
It requires to provide erasedIndexes to be for the ones that's really erased
and to be taken care of.
> Enhance raw coder allowing to read least required inputs in decoding
> --------------------------------------------------------------------
>
> Key: HADOOP-11847
> URL: https://issues.apache.org/jira/browse/HADOOP-11847
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: io
> Reporter: Kai Zheng
> Assignee: Kai Zheng
> Labels: BB2015-05-TBR
> Attachments: HADOOP-11847-HDFS-7285-v3.patch,
> HADOOP-11847-HDFS-7285-v4.patch, HADOOP-11847-v1.patch, HADOOP-11847-v2.patch
>
>
> This is to enhance raw erasure coder to allow only reading least required
> inputs while decoding. It will also refine and document the relevant APIs for
> better understanding and usage. When using least required inputs, it may add
> computating overhead but will possiblly outperform overall since less network
> traffic and disk IO are involved.
> This is something planned to do but just got reminded by [~zhz]' s question
> raised in HDFS-7678, also copied here:
> bq.Kai Zheng I have a question about decoding: in a (6+3) schema, if block #2
> is missing, and I want to repair it with blocks 0, 1, 3, 4, 5, 8, how should
> I construct the inputs to RawErasureDecoder#decode?
> With this work, hopefully the answer to above question would be obvious.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)