[
https://issues.apache.org/jira/browse/HDFS-8347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kai Zheng updated HDFS-8347:
----------------------------
Description:
While investigating a test failure in {{TestRecoverStripedFile}}, found one
issue. An extra configurable buffer size instead of the chunkSize defined the
schema is used to perform the decoding, which is incorrect and will cause a
decoding failure as below. This is exposed by latest change in erasure coder.
{noformat}
2015-05-08 18:50:06,607 WARN datanode.DataNode
(ErasureCodingWorker.java:run(386)) - Transfer failed for all targets.
2015-05-08 18:50:06,608 WARN datanode.DataNode
(ErasureCodingWorker.java:run(399)) - Failed to recover striped block:
BP-1597876081-10.239.12.51-1431082199073:blk_-9223372036854775792_1001
2015-05-08 18:50:06,609 INFO datanode.DataNode
(BlockReceiver.java:receiveBlock(826)) - Exception for
BP-1597876081-10.239.12.51-1431082199073:blk_-9223372036854775784_1001
java.io.IOException: Premature EOF from inputStream
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:203)
at
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
at
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
at
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:472)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:787)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:803)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:250)
at java.lang.Thread.run(Thread.java:745)
{noformat}
was:
While investigating a test failure in {{TestRecoverStripedFile}}, found two
issues:
* An extra buffer size instead of the chunkSize defined the schema is used to
perform the decoding, which is incorrect and will cause a decoding failure as
below. This is exposed by latest change in erasure coder.
{noformat}
2015-05-08 18:50:06,607 WARN datanode.DataNode
(ErasureCodingWorker.java:run(386)) - Transfer failed for all targets.
2015-05-08 18:50:06,608 WARN datanode.DataNode
(ErasureCodingWorker.java:run(399)) - Failed to recover striped block:
BP-1597876081-10.239.12.51-1431082199073:blk_-9223372036854775792_1001
2015-05-08 18:50:06,609 INFO datanode.DataNode
(BlockReceiver.java:receiveBlock(826)) - Exception for
BP-1597876081-10.239.12.51-1431082199073:blk_-9223372036854775784_1001
java.io.IOException: Premature EOF from inputStream
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:203)
at
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
at
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
at
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:472)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:787)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:803)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:250)
at java.lang.Thread.run(Thread.java:745)
{noformat}
* In raw erasrue coder, a bad optimization in below codes. It assumes the heap
buffer backed by the bytes array available for reading or writing always starts
with zero and takes the whole.
{code}
protected static byte[][] toArrays(ByteBuffer[] buffers) {
byte[][] bytesArr = new byte[buffers.length][];
ByteBuffer buffer;
for (int i = 0; i < buffers.length; i++) {
buffer = buffers[i];
if (buffer == null) {
bytesArr[i] = null;
continue;
}
if (buffer.hasArray()) {
bytesArr[i] = buffer.array();
} else {
throw new IllegalArgumentException("Invalid ByteBuffer passed, " +
"expecting heap buffer");
}
}
return bytesArr;
}
{code}
Will attach a patch soon to fix the two issues.
> Using chunkSize to perform erasure decoding in stripping blocks recovering
> --------------------------------------------------------------------------
>
> Key: HDFS-8347
> URL: https://issues.apache.org/jira/browse/HDFS-8347
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Kai Zheng
> Assignee: Kai Zheng
>
> While investigating a test failure in {{TestRecoverStripedFile}}, found one
> issue. An extra configurable buffer size instead of the chunkSize defined the
> schema is used to perform the decoding, which is incorrect and will cause a
> decoding failure as below. This is exposed by latest change in erasure coder.
> {noformat}
> 2015-05-08 18:50:06,607 WARN datanode.DataNode
> (ErasureCodingWorker.java:run(386)) - Transfer failed for all targets.
> 2015-05-08 18:50:06,608 WARN datanode.DataNode
> (ErasureCodingWorker.java:run(399)) - Failed to recover striped block:
> BP-1597876081-10.239.12.51-1431082199073:blk_-9223372036854775792_1001
> 2015-05-08 18:50:06,609 INFO datanode.DataNode
> (BlockReceiver.java:receiveBlock(826)) - Exception for
> BP-1597876081-10.239.12.51-1431082199073:blk_-9223372036854775784_1001
> java.io.IOException: Premature EOF from inputStream
> at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:203)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:472)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:787)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:803)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:250)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)