[
https://issues.apache.org/jira/browse/HDFS-14770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16916552#comment-16916552
]
HuangTao commented on HDFS-14770:
---------------------------------
{code:java}
[2019-08-13T18:51:09.587+08:00] [INFO] [stripedRead-2122] : Interrupted while
waiting for IO on channel java.nio.channels.SocketChannel[connected
local=/xx.xxx.xxx.xx:44044 remote=/10.198.104.232:50010]. 180000 millis timeout
left.
[2019-08-13T18:51:09.588+08:00] [WARN] [StripedBlockReconstruction-441] :
Failed to reconstruct striped block:
BP-714356632-xx.xxx.xxx.yy-1519726836856:blk_-9223372036844346848_941534825
java.lang.NullPointerException
[2019-08-13T18:51:09.588+08:00] [INFO] [DataXceiver for client at
/xx.xxx.xxx.xx:14224 [Receiving block
BP-714356632-xx.xxx.xxx.yy-1519726836856:blk_-9223372036844346845_941534825]] :
Exception for
BP-714356632-xx.xxx.xxx.yy-1519726836856:blk_-9223372036844346845_941534825
java.io.IOException: Premature EOF from inputStream
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:212)
at
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
at
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
at
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:538)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:987)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:891)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:171)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:105)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:290)
at java.lang.Thread.run(Thread.java:745)
[2019-08-13T18:51:09.589+08:00] [WARN] [DataXceiver for client at
/xx.xxx.xxx.xx:14224 [Receiving block
BP-714356632-xx.xxx.xxx.yy-1519726836856:blk_-9223372036844346845_941534825]] :
Block
BP-714356632-xx.xxx.xxx.yy-1519726836856:blk_-9223372036844346845_941534825
unfinalized and removed.
{code}
After deeply diving into the code and adding a log of logs, I find the
*StripedReconstructor#BUFFER_POOL* is the root cause.
When a part of StripedBlockReaders have already gotten data and put them in the
BUFFERs, but one StripedBlockReader(stipedRead-2122) timeout, the BUFFERs will
be put in *StripedReconstructor#BUFFER_POOL* and the recovery is terminated.
Next, another recovery will use the BUFFERs which remaining() maybe is 0, so
these dirty BUFFERs will be used to reconstruct the internal blocks. As a
result, reconstructed internal block is error.
{code:java}
/**
* Perform actual reading of bytes from block.
*/
private BlockReadStats actualReadFromBlock() throws IOException {
int len = buffer.remaining();
int n = 0;
while (n < len) {
int nread = blockReader.read(buffer);
if (nread <= 0) {
break;
}
n += nread;
stripedReader.getReconstructor().incrBytesRead(isLocal, nread);
}
return new BlockReadStats(n, blockReader.isShortCircuit(),
blockReader.getNetworkDistance());
}
{code}
> Erasure Coding: reconstructed internal block is error
> -----------------------------------------------------
>
> Key: HDFS-14770
> URL: https://issues.apache.org/jira/browse/HDFS-14770
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode, erasure-coding
> Reporter: HuangTao
> Assignee: HuangTao
> Priority: Major
>
> We found some missing EC internal blocks were reconstructed incorrectly.
> The incorrect block(md5sum)
> {code:java}
> 73ceb92c4223ce00ff88802fdafd19a3
> /data5/dfs/current/BP-XXXXXX/current/finalized/subdir6/subdir7/blk_-9223372036570159327
> {code}
> It should be
> {code:java}
> 137e1c8483126291c2e84fc0115cfcc6
> /data3/dfs/current/BP-XXXXXX/current/finalized/subdir7/subdir6/blk_-9223372036401277247
> {code}
--
This message was sent by Atlassian Jira
(v8.3.2#803003)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]