[jira] [Commented] (HADOOP-11938) Fix ByteBuffer version encode/decode API of raw erasure coder

Yi Liu (JIRA) Wed, 13 May 2015 19:12:44 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-11938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14543076#comment-14543076
 ]


Yi Liu commented on HADOOP-11938:
---------------------------------

Looks more better than original.

Some lines are longer than 80 chars.

*In AbstractRawErasureCoder.java*

{code}
+    for (int i = pos; i < buffer.remaining(); ++i) {
+      buffer.put(i, (byte) 0);
      }
{code}
it should be {{buffer.limit()}} instead of remaining
And we can just use {{buffer.put((byte)0)}}

{code}
@return the buffer itself, with ZERO bytes written, remaining the original
+   * position
{code}
"remaining the original position",  maybe "the position and limit is not 
changed after the call" is more clear.

use {{HadoopIllegalArgumentException}} instead of {{IllegalArgumentException}}

*in XORRawDecoder.java and XORRawEncoder.java*
{code}
inputs[i].position() + inputs[0].remaining()
{code}
Just need use inputs\[i\].limit()

*in RSRawDecoder.java and RSRawEncoder.java*
{code}
+    int dataLen = inputs[0].remaining();
{code}
is it necessary?  I think we don't need to pass {{dataLen}} to 
{{RSUtil.GF.solveVandermondeSystem}}

*in GaloisField.java*
{code}
public void solveVandermondeSystem(int[] x, ByteBuffer[] y,
                                      int len, int dataLen) {
{code}
As in previous comment, {{dataLen}} is unnecessary, so "idx1 < p.position() + 
dataLen" can be "idx1 < p.limit()"


{code}
  public void substitute(ByteBuffer[] p, ByteBuffer q, int x) {
-    int y = 1;
+    int y = 1, iIdx, oIdx;
+    int len = p[0].remaining();
     for (int i = 0; i < p.length; i++) {
       ByteBuffer pi = p[i];
-      int len = pi.remaining();
-      for (int j = 0; j < len; j++) {
-        int pij = pi.get(j) & 0x000000FF;
-        q.put(j, (byte) (q.get(j) ^ mulTable[pij][y]));
+      for (iIdx = pi.position(), oIdx = q.position();
+           iIdx < pi.position() + len; iIdx++, oIdx++) {
+        int pij = pi.get(iIdx) & 0x000000FF;
+        q.put(oIdx, (byte) (q.get(oIdx) ^ mulTable[pij][y]));
       }
       y = mulTable[x][y];
     }
{code}
{{len}} is unnecessary.

Same for
{code}
public void remainder(ByteBuffer[] dividend, int len, int[] divisor) {
{code}

*in TestCoderBase.java*

{code}
+  private byte[] zeroChunkBytes;
..............

   protected void eraseDataFromChunk(ECChunk chunk) {
     ByteBuffer chunkBuffer = chunk.getBuffer();
-    // erase the data
-    chunkBuffer.position(0);
-    for (int i = 0; i < chunkSize; i++) {
-      chunkBuffer.put((byte) 0);
-    }
+    // erase the data at the position, and restore the buffer ready for reading
+    // chunkSize bytes but all ZERO.
+    int pos = chunkBuffer.position();
+    chunkBuffer.flip();
+    chunkBuffer.position(pos);
+    chunkBuffer.limit(pos + chunkSize);
+    chunkBuffer.put(zeroChunkBytes);
     chunkBuffer.flip();
+    chunkBuffer.position(pos);
+    chunkBuffer.limit(pos + chunkSize);
{code}

{code}
-  protected static ECChunk cloneChunkWithData(ECChunk chunk) {
+  protected ECChunk cloneChunkWithData(ECChunk chunk) {
     ByteBuffer srcBuffer = chunk.getBuffer();
-    ByteBuffer destBuffer;
+    ByteBuffer destBuffer = allocateOutputChunkBuffer();
  
-    byte[] bytesArr = new byte[srcBuffer.remaining()];
+    byte[] bytesArr = new byte[chunkSize];
     srcBuffer.mark();
     srcBuffer.get(bytesArr);
     srcBuffer.reset();
  
-    if (srcBuffer.hasArray()) {
-      destBuffer = ByteBuffer.wrap(bytesArr);
-    } else {
-      destBuffer = ByteBuffer.allocateDirect(srcBuffer.remaining());
-      destBuffer.put(bytesArr);
-      destBuffer.flip();
-    }
+    int pos = destBuffer.position();
+    destBuffer.put(bytesArr);
+    destBuffer.flip();
+    destBuffer.position(pos);
{code}

{{destBuffer}} is still assumed to be chunkSize
Furthermore, some unnecessary flip


{code}
+  /**
+   * Convert an array of this chunks to an array of byte array.
+   * Note the chunk buffers are not affected.
+   * @param chunks
+   * @return an array of byte array
+   */
+  protected byte[][] toArrays(ECChunk[] chunks) {
+    byte[][] bytesArr = new byte[chunks.length][];
+
+    ByteBuffer buffer;
+    for (int i = 0; i < chunks.length; i++) {
+      buffer = chunks[i].getBuffer();
+      if (buffer.hasArray() && buffer.position() == 0 &&
+          buffer.remaining() == chunkSize) {
+        bytesArr[i] = buffer.array();
+      } else {
+        bytesArr[i] = new byte[buffer.remaining()];
+        // Avoid affecting the original one
+        buffer.mark();
+        buffer.get(bytesArr[i]);
+        buffer.reset();
+      }
+    }
+
+    return bytesArr;
+  }
{code}
We already have this method, use {{ECChunk.toBuffers}}  ?  Convert to 
bytebuffer is enough? If not, should we have this method in code not only in 
test?


*In TestRawCoderBase.java*
You should add more description about your tests, for example, the negative 
test is for what and how you test,  you also need find a good name for it.
{code}
protected void testCodingNegative(boolean usingDirectBuffer) {
+    this.usingDirectBuffer = usingDirectBuffer;
+    prepareCoders();
+
+    boolean isOK1;
+    try {
+      performTestCoding(baseChunkSize, true, false);
+      isOK1 = false;
+    } catch (Exception e) {
+      isOK1 = true;
+    }
+
+    boolean isOK2;
+    try {
+      performTestCoding(baseChunkSize, false, true);
+      isOK2 = false;
+    } catch (Exception e) {
+      isOK2 = true;
+    }
+
+    Assert.assertTrue("Negative tests passed", isOK1 && isOK2);
+  }
{code}
It should something like following, you can refer to other tests.
{code}
try {
  performTestCoding(baseChunkSize, true, false);
  Assert.fail("")  <-  your description here
} catch (Exception e) {
  // expected        <-  you can also choose to compare the exception msg
}
{code}


> Fix ByteBuffer version encode/decode API of raw erasure coder
> -------------------------------------------------------------
>
>                 Key: HADOOP-11938
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11938
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: io
>            Reporter: Kai Zheng
>            Assignee: Kai Zheng
>         Attachments: HADOOP-11938-HDFS-7285-v1.patch, 
> HADOOP-11938-HDFS-7285-v2.patch, HADOOP-11938-HDFS-7285-workaround.patch
>
>
> While investigating a test failure in {{TestRecoverStripedFile}}, one issue 
> in raw erasrue coder, caused by an optimization in below codes. It assumes 
> the  heap buffer backed by the bytes array available for reading or writing 
> always starts with zero and takes the whole space.
> {code}
>   protected static byte[][] toArrays(ByteBuffer[] buffers) {
>     byte[][] bytesArr = new byte[buffers.length][];
>     ByteBuffer buffer;
>     for (int i = 0; i < buffers.length; i++) {
>       buffer = buffers[i];
>       if (buffer == null) {
>         bytesArr[i] = null;
>         continue;
>       }
>       if (buffer.hasArray()) {
>         bytesArr[i] = buffer.array();
>       } else {
>         throw new IllegalArgumentException("Invalid ByteBuffer passed, " +
>             "expecting heap buffer");
>       }
>     }
>     return bytesArr;
>   }
> {code} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11938) Fix ByteBuffer version encode/decode API of raw erasure coder

Reply via email to