[ https://issues.apache.org/jira/browse/HDFS-8347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kai Zheng updated HDFS-8347: ---------------------------- Description: While investigating a test failure in {{TestRecoverStripedFile}}, found one issue. An extra configurable buffer size instead of the chunkSize defined the schema is used to perform the decoding, which is incorrect and will cause a decoding failure as below. This is exposed by latest change in erasure coder. {noformat} 2015-05-08 18:50:06,607 WARN datanode.DataNode (ErasureCodingWorker.java:run(386)) - Transfer failed for all targets. 2015-05-08 18:50:06,608 WARN datanode.DataNode (ErasureCodingWorker.java:run(399)) - Failed to recover striped block: BP-1597876081-10.239.12.51-1431082199073:blk_-9223372036854775792_1001 2015-05-08 18:50:06,609 INFO datanode.DataNode (BlockReceiver.java:receiveBlock(826)) - Exception for BP-1597876081-10.239.12.51-1431082199073:blk_-9223372036854775784_1001 java.io.IOException: Premature EOF from inputStream at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:203) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:472) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:787) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:803) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:250) at java.lang.Thread.run(Thread.java:745) {noformat} was: While investigating a test failure in {{TestRecoverStripedFile}}, found two issues: * An extra buffer size instead of the chunkSize defined the schema is used to perform the decoding, which is incorrect and will cause a decoding failure as below. This is exposed by latest change in erasure coder. {noformat} 2015-05-08 18:50:06,607 WARN datanode.DataNode (ErasureCodingWorker.java:run(386)) - Transfer failed for all targets. 2015-05-08 18:50:06,608 WARN datanode.DataNode (ErasureCodingWorker.java:run(399)) - Failed to recover striped block: BP-1597876081-10.239.12.51-1431082199073:blk_-9223372036854775792_1001 2015-05-08 18:50:06,609 INFO datanode.DataNode (BlockReceiver.java:receiveBlock(826)) - Exception for BP-1597876081-10.239.12.51-1431082199073:blk_-9223372036854775784_1001 java.io.IOException: Premature EOF from inputStream at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:203) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:472) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:787) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:803) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:250) at java.lang.Thread.run(Thread.java:745) {noformat} * In raw erasrue coder, a bad optimization in below codes. It assumes the heap buffer backed by the bytes array available for reading or writing always starts with zero and takes the whole. {code} protected static byte[][] toArrays(ByteBuffer[] buffers) { byte[][] bytesArr = new byte[buffers.length][]; ByteBuffer buffer; for (int i = 0; i < buffers.length; i++) { buffer = buffers[i]; if (buffer == null) { bytesArr[i] = null; continue; } if (buffer.hasArray()) { bytesArr[i] = buffer.array(); } else { throw new IllegalArgumentException("Invalid ByteBuffer passed, " + "expecting heap buffer"); } } return bytesArr; } {code} Will attach a patch soon to fix the two issues. > Using chunkSize to perform erasure decoding in stripping blocks recovering > -------------------------------------------------------------------------- > > Key: HDFS-8347 > URL: https://issues.apache.org/jira/browse/HDFS-8347 > Project: Hadoop HDFS > Issue Type: Sub-task > Reporter: Kai Zheng > Assignee: Kai Zheng > > While investigating a test failure in {{TestRecoverStripedFile}}, found one > issue. An extra configurable buffer size instead of the chunkSize defined the > schema is used to perform the decoding, which is incorrect and will cause a > decoding failure as below. This is exposed by latest change in erasure coder. > {noformat} > 2015-05-08 18:50:06,607 WARN datanode.DataNode > (ErasureCodingWorker.java:run(386)) - Transfer failed for all targets. > 2015-05-08 18:50:06,608 WARN datanode.DataNode > (ErasureCodingWorker.java:run(399)) - Failed to recover striped block: > BP-1597876081-10.239.12.51-1431082199073:blk_-9223372036854775792_1001 > 2015-05-08 18:50:06,609 INFO datanode.DataNode > (BlockReceiver.java:receiveBlock(826)) - Exception for > BP-1597876081-10.239.12.51-1431082199073:blk_-9223372036854775784_1001 > java.io.IOException: Premature EOF from inputStream > at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:203) > at > org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213) > at > org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134) > at > org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:472) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:787) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:803) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:250) > at java.lang.Thread.run(Thread.java:745) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)