Some junit tests fail with the exception: All datanodes are bad. Aborting...
----------------------------------------------------------------------------

                 Key: HADOOP-2691
                 URL: https://issues.apache.org/jira/browse/HADOOP-2691
             Project: Hadoop Core
          Issue Type: Bug
          Components: dfs
    Affects Versions: 0.15.2
            Reporter: Hairong Kuang
             Fix For: 0.16.0


Some junit tests fail with the following exception:
java.io.IOException: All datanodes are bad. Aborting...
        at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:1831)
        at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1100(DFSClient.java:1479)
        at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1571)
The log contains the following message:
2008-01-19 23:00:25,557 INFO  dfs.StateChange 
(FSNamesystem.java:allocateBlock(1274)) - BLOCK* NameSystem.allocateBlock: 
/srcdat/three/3189919341591612220. blk_6989304691537873255
2008-01-19 23:00:25,559 INFO  fs.DFSClient 
(DFSClient.java:createBlockOutputStream(1982)) - pipeline = 127.0.0.1:40678
2008-01-19 23:00:25,559 INFO  fs.DFSClient 
(DFSClient.java:createBlockOutputStream(1982)) - pipeline = 127.0.0.1:40680
2008-01-19 23:00:25,559 INFO  fs.DFSClient 
(DFSClient.java:createBlockOutputStream(1985)) - Connecting to 127.0.0.1:40678
2008-01-19 23:00:25,570 INFO  dfs.DataNode (DataNode.java:writeBlock(1084)) - 
Receiving block blk_6989304691537873255 from /127.0.0.1
2008-01-19 23:00:25,572 INFO  dfs.DataNode (DataNode.java:writeBlock(1084)) - 
Receiving block blk_6989304691537873255 from /127.0.0.1
2008-01-19 23:00:25,573 INFO  dfs.DataNode (DataNode.java:writeBlock(1169)) - 
Datanode 0 forwarding connect ack to upstream firstbadlink is 
2008-01-19 23:00:25,573 INFO  dfs.DataNode (DataNode.java:writeBlock(1150)) - 
Datanode 1 got response for connect ack  from downstream datanode with 
firstbadlink as 
2008-01-19 23:00:25,573 INFO  dfs.DataNode (DataNode.java:writeBlock(1169)) - 
Datanode 1 forwarding connect ack to upstream firstbadlink is 
2008-01-19 23:00:25,574 INFO  dfs.DataNode 
(DataNode.java:lastDataNodeRun(1802)) - Received block blk_6989304691537873255 
of size 34 from /127.0.0.1
2008-01-19 23:00:25,575 INFO  dfs.DataNode 
(DataNode.java:lastDataNodeRun(1819)) - PacketResponder 0 for block 
blk_6989304691537873255 terminating
2008-01-19 23:00:25,575 INFO  dfs.StateChange 
(FSNamesystem.java:addStoredBlock(2467)) - BLOCK* NameSystem.addStoredBlock: 
blockMap updated: 127.0.0.1:40680 is added to blk_6989304691537873255 size 34
2008-01-19 23:00:25,575 INFO  dfs.DataNode (DataNode.java:close(2013)) - 
BlockReceiver for block blk_6989304691537873255 waiting for last write to drain.
2008-01-19 23:01:31,577 WARN  fs.DFSClient (DFSClient.java:run(1764)) - 
DFSOutputStream ResponseProcessor exception  for block 
blk_6989304691537873255java.net.SocketTimeoutException: Read timed out
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.read(SocketInputStream.java:129)
        at java.io.DataInputStream.readFully(DataInputStream.java:176)
        at java.io.DataInputStream.readLong(DataInputStream.java:380)
        at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:1726)

2008-01-19 23:01:31,578 INFO  fs.DFSClient (DFSClient.java:run(1653)) - Closing 
old block blk_6989304691537873255
2008-01-19 23:01:31,579 WARN  fs.DFSClient 
(DFSClient.java:processDatanodeError(1803)) - Error Recovery for block 
blk_6989304691537873255 bad datanode[0] 127.0.0.1:40678
2008-01-19 23:01:31,580 WARN  fs.DFSClient 
(DFSClient.java:processDatanodeError(1836)) - Error Recovery for block 
blk_6989304691537873255 bad datanode 127.0.0.1:40678
2008-01-19 23:01:31,580 INFO  fs.DFSClient 
(DFSClient.java:createBlockOutputStream(1982)) - pipeline = 127.0.0.1:40680
2008-01-19 23:01:31,580 INFO  fs.DFSClient 
(DFSClient.java:createBlockOutputStream(1985)) - Connecting to 127.0.0.1:40680
2008-01-19 23:01:31,582 INFO  dfs.DataNode (DataNode.java:writeBlock(1084)) - 
Receiving block blk_6989304691537873255 from /127.0.0.1
2008-01-19 23:01:31,584 INFO  dfs.DataNode (DataNode.java:writeBlock(1196)) - 
writeBlock blk_6989304691537873255 received exception java.io.IOException: 
Reopen Block blk_6989304691537873255 is valid, and cannot be written to.
2008-01-19 23:01:31,584 ERROR dfs.DataNode (DataNode.java:run(997)) - 
127.0.0.1:40680:DataXceiver: java.io.IOException: Reopen Block 
blk_6989304691537873255 is valid, and cannot be written to.
        at org.apache.hadoop.dfs.FSDataset.writeToBlock(FSDataset.java:613)
        at 
org.apache.hadoop.dfs.DataNode$BlockReceiver.<init>(DataNode.java:1996)
        at 
org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1109)
        at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:982)
        at java.lang.Thread.run(Thread.java:595)

2008-01-19 23:01:31,585 INFO  fs.DFSClient 
(DFSClient.java:createBlockOutputStream(2024)) - Exception in 
createBlockOutputStream java.io.EOFException

The log shows that blk_6989304691537873255 was successfully written to two 
datanodes. But dfsclient timed out waiting for a response from the first 
datanode. It tried to recover from the failure by resending the data to the 
second datanode. However, the recovery failed because the second datanode threw 
an IOException when it detected that it already had the block. It would be nice 
that the second datanode does not throw an exception for a finalized block 
during a recovery.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to