[jira] Commented: (HADOOP-2691) Some junit tests fail with the exception: All datanodes are bad. Aborting...

Jim Kellerman (JIRA) Wed, 23 Jan 2008 21:06:54 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12561940#action_12561940
 ]


Jim Kellerman commented on HADOOP-2691:
---------------------------------------

As of revision 614793, this problem is always reproducible in the HBase test 
case TestTableMapReduce on my dual core Dell laptop running windows. 

Since we normally set the Hadoop log level to WARN so we can more easily see 
HBase log messages, I am attaching a patch that will set the Hadoop log level 
to INFO for this test. 

Additionally, I am attaching the log file from a test run. The log file is 
truncated, because one of the HBase servers takes a real long time to shut 
down, and that part of the log is irrelevant to this issue (other than the 
fact, that the 'All datanodes are bad' error is what causes it to go psychotic.

> Some junit tests fail with the exception: All datanodes are bad. Aborting...
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-2691
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2691
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.15.2
>            Reporter: Hairong Kuang
>            Assignee: dhruba borthakur
>             Fix For: 0.16.0
>
>         Attachments: datanodesBad.patch, datanodesBad1.patch
>
>
> Some junit tests fail with the following exception:
> java.io.IOException: All datanodes are bad. Aborting...
>       at 
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:1831)
>       at 
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1100(DFSClient.java:1479)
>       at 
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1571)
> The log contains the following message:
> 2008-01-19 23:00:25,557 INFO  dfs.StateChange 
> (FSNamesystem.java:allocateBlock(1274)) - BLOCK* NameSystem.allocateBlock: 
> /srcdat/three/3189919341591612220. blk_6989304691537873255
> 2008-01-19 23:00:25,559 INFO  fs.DFSClient 
> (DFSClient.java:createBlockOutputStream(1982)) - pipeline = 127.0.0.1:40678
> 2008-01-19 23:00:25,559 INFO  fs.DFSClient 
> (DFSClient.java:createBlockOutputStream(1982)) - pipeline = 127.0.0.1:40680
> 2008-01-19 23:00:25,559 INFO  fs.DFSClient 
> (DFSClient.java:createBlockOutputStream(1985)) - Connecting to 127.0.0.1:40678
> 2008-01-19 23:00:25,570 INFO  dfs.DataNode (DataNode.java:writeBlock(1084)) - 
> Receiving block blk_6989304691537873255 from /127.0.0.1
> 2008-01-19 23:00:25,572 INFO  dfs.DataNode (DataNode.java:writeBlock(1084)) - 
> Receiving block blk_6989304691537873255 from /127.0.0.1
> 2008-01-19 23:00:25,573 INFO  dfs.DataNode (DataNode.java:writeBlock(1169)) - 
> Datanode 0 forwarding connect ack to upstream firstbadlink is 
> 2008-01-19 23:00:25,573 INFO  dfs.DataNode (DataNode.java:writeBlock(1150)) - 
> Datanode 1 got response for connect ack  from downstream datanode with 
> firstbadlink as 
> 2008-01-19 23:00:25,573 INFO  dfs.DataNode (DataNode.java:writeBlock(1169)) - 
> Datanode 1 forwarding connect ack to upstream firstbadlink is 
> 2008-01-19 23:00:25,574 INFO  dfs.DataNode 
> (DataNode.java:lastDataNodeRun(1802)) - Received block 
> blk_6989304691537873255 of size 34 from /127.0.0.1
> 2008-01-19 23:00:25,575 INFO  dfs.DataNode 
> (DataNode.java:lastDataNodeRun(1819)) - PacketResponder 0 for block 
> blk_6989304691537873255 terminating
> 2008-01-19 23:00:25,575 INFO  dfs.StateChange 
> (FSNamesystem.java:addStoredBlock(2467)) - BLOCK* NameSystem.addStoredBlock: 
> blockMap updated: 127.0.0.1:40680 is added to blk_6989304691537873255 size 34
> 2008-01-19 23:00:25,575 INFO  dfs.DataNode (DataNode.java:close(2013)) - 
> BlockReceiver for block blk_6989304691537873255 waiting for last write to 
> drain.
> 2008-01-19 23:01:31,577 WARN  fs.DFSClient (DFSClient.java:run(1764)) - 
> DFSOutputStream ResponseProcessor exception  for block 
> blk_6989304691537873255java.net.SocketTimeoutException: Read timed out
>       at java.net.SocketInputStream.socketRead0(Native Method)
>       at java.net.SocketInputStream.read(SocketInputStream.java:129)
>       at java.io.DataInputStream.readFully(DataInputStream.java:176)
>       at java.io.DataInputStream.readLong(DataInputStream.java:380)
>       at 
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:1726)
> 2008-01-19 23:01:31,578 INFO  fs.DFSClient (DFSClient.java:run(1653)) - 
> Closing old block blk_6989304691537873255
> 2008-01-19 23:01:31,579 WARN  fs.DFSClient 
> (DFSClient.java:processDatanodeError(1803)) - Error Recovery for block 
> blk_6989304691537873255 bad datanode[0] 127.0.0.1:40678
> 2008-01-19 23:01:31,580 WARN  fs.DFSClient 
> (DFSClient.java:processDatanodeError(1836)) - Error Recovery for block 
> blk_6989304691537873255 bad datanode 127.0.0.1:40678
> 2008-01-19 23:01:31,580 INFO  fs.DFSClient 
> (DFSClient.java:createBlockOutputStream(1982)) - pipeline = 127.0.0.1:40680
> 2008-01-19 23:01:31,580 INFO  fs.DFSClient 
> (DFSClient.java:createBlockOutputStream(1985)) - Connecting to 127.0.0.1:40680
> 2008-01-19 23:01:31,582 INFO  dfs.DataNode (DataNode.java:writeBlock(1084)) - 
> Receiving block blk_6989304691537873255 from /127.0.0.1
> 2008-01-19 23:01:31,584 INFO  dfs.DataNode (DataNode.java:writeBlock(1196)) - 
> writeBlock blk_6989304691537873255 received exception java.io.IOException: 
> Reopen Block blk_6989304691537873255 is valid, and cannot be written to.
> 2008-01-19 23:01:31,584 ERROR dfs.DataNode (DataNode.java:run(997)) - 
> 127.0.0.1:40680:DataXceiver: java.io.IOException: Reopen Block 
> blk_6989304691537873255 is valid, and cannot be written to.
>       at org.apache.hadoop.dfs.FSDataset.writeToBlock(FSDataset.java:613)
>       at 
> org.apache.hadoop.dfs.DataNode$BlockReceiver.<init>(DataNode.java:1996)
>       at 
> org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1109)
>       at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:982)
>       at java.lang.Thread.run(Thread.java:595)
> 2008-01-19 23:01:31,585 INFO  fs.DFSClient 
> (DFSClient.java:createBlockOutputStream(2024)) - Exception in 
> createBlockOutputStream java.io.EOFException
> The log shows that blk_6989304691537873255 was successfully written to two 
> datanodes. But dfsclient timed out waiting for a response from the first 
> datanode. It tried to recover from the failure by resending the data to the 
> second datanode. However, the recovery failed because the second datanode 
> threw an IOException when it detected that it already had the block. It would 
> be nice that the second datanode does not throw an exception for a finalized 
> block during a recovery.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2691) Some junit tests fail with the exception: All datanodes are bad. Aborting...

Reply via email to