[jira] [Commented] (HADOOP-11847) Enhance raw coder allowing to read least required inputs in decoding

Yi Liu (JIRA) Thu, 21 May 2015 18:59:32 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-11847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14555446#comment-14555446
 ]


Yi Liu commented on HADOOP-11847:
---------------------------------

*in AbstractRawErasureDecoder.java*
for findFirstValidInput, still one comment not addressed:
{code}
+    if (inputs[0] != null) {
+      return inputs[0];
+    }
+
+    for (int i = 1; i < inputs.length; i++) {
+      if (inputs[i] != null) {
+        return inputs[i];
+      }
+    }
{code}
It can be:
{code}
for (int i = 0; i < inputs.length; i++) {
  if (inputs[i] != null) {
    return inputs[i];
  }
}
{code}

*In RSRawDecoder.java*
{code}
private void ensureBytesArrayBuffers(int dataLen) {
    if (bytesArrayBuffers == null || bytesArrayBuffers[0].length < dataLen) {
      /**
       * Create this set of buffers on demand, which is only needed at the first
       * time running into this, using bytes array.
       */
      // Erased or not to read
      int maxInvalidUnits = getNumParityUnits();
      adjustedByteArrayOutputsParameter = new byte[maxInvalidUnits][];
      adjustedOutputOffsets = new int[maxInvalidUnits];

      // These are temp buffers for both inputs and outputs
      bytesArrayBuffers = new byte[maxInvalidUnits * 2][];
      for (int i = 0; i < bytesArrayBuffers.length; ++i) {
        bytesArrayBuffers[i] = new byte[dataLen];
      }
    }
  }

  private void ensureDirectBuffers(int dataLen) {
    if (directBuffers == null || directBuffers[0].capacity() < dataLen) {
      /**
       * Create this set of buffers on demand, which is only needed at the first
       * time running into this, using DirectBuffer.
       */
      // Erased or not to read
      int maxInvalidUnits = getNumParityUnits();
      adjustedDirectBufferOutputsParameter = new ByteBuffer[maxInvalidUnits];

      // These are temp buffers for both inputs and outputs
      directBuffers = new ByteBuffer[maxInvalidUnits * 2];
      for (int i = 0; i < directBuffers.length; i++) {
        directBuffers[i] = ByteBuffer.allocateDirect(dataLen);
      }
    }
  }
{code}
1.  Do we need {{maxInvalidUnits * 2}} for bytesArrayBuffers and directBuffers? 
Since we don't need additional buffer for inputs.  The correct size should be 
{{parityUnitNum - outputs.length}}. If next time, there is no enough buffer, 
then you allocate new.
2. The share buffer size should be always the chunk size, otherwise they can't 
be shared, since the dataLen may be different.  

In {{doDecode}}
{code}
for (int i = 0; i < adjustedByteArrayOutputsParameter.length; i++) {
      adjustedByteArrayOutputsParameter[i] =
          resetBuffer(bytesArrayBuffers[bufferIdx++], 0, dataLen);
      adjustedOutputOffsets[i] = 0; // Always 0 for such temp output
    }

    int outputIdx = 0;
    for (int i = 0; i < erasedIndexes.length; i++, outputIdx++) {
      for (int j = 0; j < erasedOrNotToReadIndexes.length; j++) {
        // If this index is one requested by the caller via erasedIndexes, then
        // we use the passed output buffer to avoid copying data thereafter.
        if (erasedIndexes[i] == erasedOrNotToReadIndexes[j]) {
          adjustedByteArrayOutputsParameter[j] =
              resetBuffer(outputs[outputIdx], 0, dataLen);
          adjustedOutputOffsets[j] = outputOffsets[outputIdx];
        }
      }
    }
{code}
1. We should check erasedOrNotToReadIndexes contains erasedIndexes. 
2. We just need one loop,  go though {{adjustedByteArrayOutputsParameter}}, 
assign buffer from outputs if exists, otherwise from {{bytesArrayBuffers}}


> Enhance raw coder allowing to read least required inputs in decoding
> --------------------------------------------------------------------
>
>                 Key: HADOOP-11847
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11847
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: io
>            Reporter: Kai Zheng
>            Assignee: Kai Zheng
>              Labels: BB2015-05-TBR
>         Attachments: HADOOP-11847-HDFS-7285-v3.patch, 
> HADOOP-11847-HDFS-7285-v4.patch, HADOOP-11847-HDFS-7285-v5.patch, 
> HADOOP-11847-HDFS-7285-v6.patch, HADOOP-11847-v1.patch, HADOOP-11847-v2.patch
>
>
> This is to enhance raw erasure coder to allow only reading least required 
> inputs while decoding. It will also refine and document the relevant APIs for 
> better understanding and usage. When using least required inputs, it may add 
> computating overhead but will possiblly outperform overall since less network 
> traffic and disk IO are involved.
> This is something planned to do but just got reminded by [~zhz]' s question 
> raised in HDFS-7678, also copied here:
> bq.Kai Zheng I have a question about decoding: in a (6+3) schema, if block #2 
> is missing, and I want to repair it with blocks 0, 1, 3, 4, 5, 8, how should 
> I construct the inputs to RawErasureDecoder#decode?
> With this work, hopefully the answer to above question would be obvious.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11847) Enhance raw coder allowing to read least required inputs in decoding

Reply via email to