Any suggestions would be welcome :-) Arvind
________________________________ From: Arvind Sharma <[email protected]> To: [email protected] Sent: Wed, December 2, 2009 8:02:39 AM Subject: DFSClient write error when DN down I have seen similar error logs in the Hadoop Jira (Hadoop-2691, HDFS-795 ) but not sure this one is exactly the same scenario. Hadoop - 0.19.2 The client side DFSClient fails to write when few of the DN in a grid goes down. I see this error : *************************** 2009-11-13 13:45:27,815 WARN DFSClient | DFSOutputStream ResponseProcessor exception for block blk_3028932254678171367_1462691java.io.IOException: Bad response 1 for block blk_30289322 54678171367_1462691 from datanode 10.201.9.225:50010 at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2341) 2009-11-13 13:45:27,815 WARN DFSClient | Error Recovery for block blk_3028932254678171367_1462691 bad datanode[2] 10.201.9.225:50010 2009-11-13 13:45:27,815 WARN DFSClient | Error Recovery for block blk_3028932254678171367_1462691 in pipeline 10.201.9.218:50010, 10.201.9.220:50010, 10.201.9.225:50010: bad datanode 10 ...201.9.225:50010 2009-11-13 13:45:37,433 WARN DFSClient | DFSOutputStream ResponseProcessor exception for block blk_-6619123912237837733_1462799java.io.IOException: Bad response 1 for block blk_-661912 3912237837733_1462799 from datanode 10.201.9.225:50010 at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2341)2009-11-13 13:45:37,433 WARN DFSClient | Error Recovery for block blk_-6619123912237837733_1462799 bad datanode[1] 10.201.9.225:50010 2009-11-13 13:45:37,433 WARN DFSClient | Error Recovery for block blk_-6619123912237837733_1462799 in pipeline 10.201.9.218:50010, 10.201.9.225:50010: bad datanode 10.201.9.225:50010 *************************** The only way I could get my client program to write successfully to the DFS was to re-start it. Any suggestions how to get around this problem on the client side ? As I understood, the DFSClient APIs will take care of situations like this and the clients don't need to worry about if some of the DN goes down. Also, the replication factor is 3 in my setup and there are 10 DN (out of which TWO went down) Thanks! Arvind
