Hi Arvind, Looks to me like you've identified the JIRAs that are causing this. Hopefully they will be fixed soon.
-Todd On Fri, Dec 4, 2009 at 4:43 AM, Arvind Sharma <[email protected]> wrote: > Any suggestions would be welcome :-) > > Arvind > > > > > > > ________________________________ > From: Arvind Sharma <[email protected]> > To: [email protected] > Sent: Wed, December 2, 2009 8:02:39 AM > Subject: DFSClient write error when DN down > > > > I have seen similar error logs in the Hadoop Jira (Hadoop-2691, HDFS-795 ) > but not sure this one is exactly the same scenario. > > Hadoop - 0.19.2 > > The client side DFSClient fails to write when few of the DN in a grid goes > down. I see this error : > > *************************** > > 2009-11-13 13:45:27,815 WARN DFSClient | DFSOutputStream > ResponseProcessor exception for block > blk_3028932254678171367_1462691java.io.IOException: Bad response 1 for > block blk_30289322 > 54678171367_1462691 from datanode 10.201.9.225:50010 > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2341) > 2009-11-13 13:45:27,815 WARN DFSClient | Error Recovery for > block blk_3028932254678171367_1462691 bad datanode[2] 10.201.9.225:50010 > 2009-11-13 13:45:27,815 WARN DFSClient | Error Recovery for block > blk_3028932254678171367_1462691 in pipeline 10.201.9.218:50010, > 10.201.9.220:50010, 10.201.9.225:50010: bad datanode 10 > ...201.9.225:50010 > 2009-11-13 13:45:37,433 WARN DFSClient | DFSOutputStream > ResponseProcessor exception for block > blk_-6619123912237837733_1462799java.io.IOException: Bad response 1 for > block blk_-661912 > 3912237837733_1462799 from datanode 10.201.9.225:50010 > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2341)2009-11-13 > 13:45:37,433 WARN DFSClient | Error Recovery for block > blk_-6619123912237837733_1462799 bad datanode[1] 10.201.9.225:50010 > 2009-11-13 13:45:37,433 WARN DFSClient | Error Recovery for block > blk_-6619123912237837733_1462799 in pipeline 10.201.9.218:50010, > 10.201.9.225:50010: bad datanode 10.201.9.225:50010 > > > *************************** > > The only way I could get my client program to write successfully to the DFS > was to re-start it. > > Any suggestions how to get around this problem on the client side ? As I > understood, the DFSClient APIs will take care of situations like this and > the clients don't need to worry about if some of the DN goes down. > > Also, the replication factor is 3 in my setup and there are 10 DN (out of > which TWO went down) > > > Thanks! > Arvind > > > >
