Some junit tests fail with the exception: All datanodes are bad. Aborting...
----------------------------------------------------------------------------
Key: HADOOP-2691
URL: https://issues.apache.org/jira/browse/HADOOP-2691
Project: Hadoop Core
Issue Type: Bug
Components: dfs
Affects Versions: 0.15.2
Reporter: Hairong Kuang
Fix For: 0.16.0
Some junit tests fail with the following exception:
java.io.IOException: All datanodes are bad. Aborting...
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:1831)
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1100(DFSClient.java:1479)
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1571)
The log contains the following message:
2008-01-19 23:00:25,557 INFO dfs.StateChange
(FSNamesystem.java:allocateBlock(1274)) - BLOCK* NameSystem.allocateBlock:
/srcdat/three/3189919341591612220. blk_6989304691537873255
2008-01-19 23:00:25,559 INFO fs.DFSClient
(DFSClient.java:createBlockOutputStream(1982)) - pipeline = 127.0.0.1:40678
2008-01-19 23:00:25,559 INFO fs.DFSClient
(DFSClient.java:createBlockOutputStream(1982)) - pipeline = 127.0.0.1:40680
2008-01-19 23:00:25,559 INFO fs.DFSClient
(DFSClient.java:createBlockOutputStream(1985)) - Connecting to 127.0.0.1:40678
2008-01-19 23:00:25,570 INFO dfs.DataNode (DataNode.java:writeBlock(1084)) -
Receiving block blk_6989304691537873255 from /127.0.0.1
2008-01-19 23:00:25,572 INFO dfs.DataNode (DataNode.java:writeBlock(1084)) -
Receiving block blk_6989304691537873255 from /127.0.0.1
2008-01-19 23:00:25,573 INFO dfs.DataNode (DataNode.java:writeBlock(1169)) -
Datanode 0 forwarding connect ack to upstream firstbadlink is
2008-01-19 23:00:25,573 INFO dfs.DataNode (DataNode.java:writeBlock(1150)) -
Datanode 1 got response for connect ack from downstream datanode with
firstbadlink as
2008-01-19 23:00:25,573 INFO dfs.DataNode (DataNode.java:writeBlock(1169)) -
Datanode 1 forwarding connect ack to upstream firstbadlink is
2008-01-19 23:00:25,574 INFO dfs.DataNode
(DataNode.java:lastDataNodeRun(1802)) - Received block blk_6989304691537873255
of size 34 from /127.0.0.1
2008-01-19 23:00:25,575 INFO dfs.DataNode
(DataNode.java:lastDataNodeRun(1819)) - PacketResponder 0 for block
blk_6989304691537873255 terminating
2008-01-19 23:00:25,575 INFO dfs.StateChange
(FSNamesystem.java:addStoredBlock(2467)) - BLOCK* NameSystem.addStoredBlock:
blockMap updated: 127.0.0.1:40680 is added to blk_6989304691537873255 size 34
2008-01-19 23:00:25,575 INFO dfs.DataNode (DataNode.java:close(2013)) -
BlockReceiver for block blk_6989304691537873255 waiting for last write to drain.
2008-01-19 23:01:31,577 WARN fs.DFSClient (DFSClient.java:run(1764)) -
DFSOutputStream ResponseProcessor exception for block
blk_6989304691537873255java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at java.io.DataInputStream.readFully(DataInputStream.java:176)
at java.io.DataInputStream.readLong(DataInputStream.java:380)
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:1726)
2008-01-19 23:01:31,578 INFO fs.DFSClient (DFSClient.java:run(1653)) - Closing
old block blk_6989304691537873255
2008-01-19 23:01:31,579 WARN fs.DFSClient
(DFSClient.java:processDatanodeError(1803)) - Error Recovery for block
blk_6989304691537873255 bad datanode[0] 127.0.0.1:40678
2008-01-19 23:01:31,580 WARN fs.DFSClient
(DFSClient.java:processDatanodeError(1836)) - Error Recovery for block
blk_6989304691537873255 bad datanode 127.0.0.1:40678
2008-01-19 23:01:31,580 INFO fs.DFSClient
(DFSClient.java:createBlockOutputStream(1982)) - pipeline = 127.0.0.1:40680
2008-01-19 23:01:31,580 INFO fs.DFSClient
(DFSClient.java:createBlockOutputStream(1985)) - Connecting to 127.0.0.1:40680
2008-01-19 23:01:31,582 INFO dfs.DataNode (DataNode.java:writeBlock(1084)) -
Receiving block blk_6989304691537873255 from /127.0.0.1
2008-01-19 23:01:31,584 INFO dfs.DataNode (DataNode.java:writeBlock(1196)) -
writeBlock blk_6989304691537873255 received exception java.io.IOException:
Reopen Block blk_6989304691537873255 is valid, and cannot be written to.
2008-01-19 23:01:31,584 ERROR dfs.DataNode (DataNode.java:run(997)) -
127.0.0.1:40680:DataXceiver: java.io.IOException: Reopen Block
blk_6989304691537873255 is valid, and cannot be written to.
at org.apache.hadoop.dfs.FSDataset.writeToBlock(FSDataset.java:613)
at
org.apache.hadoop.dfs.DataNode$BlockReceiver.<init>(DataNode.java:1996)
at
org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1109)
at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:982)
at java.lang.Thread.run(Thread.java:595)
2008-01-19 23:01:31,585 INFO fs.DFSClient
(DFSClient.java:createBlockOutputStream(2024)) - Exception in
createBlockOutputStream java.io.EOFException
The log shows that blk_6989304691537873255 was successfully written to two
datanodes. But dfsclient timed out waiting for a response from the first
datanode. It tried to recover from the failure by resending the data to the
second datanode. However, the recovery failed because the second datanode threw
an IOException when it detected that it already had the block. It would be nice
that the second datanode does not throw an exception for a finalized block
during a recovery.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.