Hi all, In simple terms, Why is any output stream that failed to close when the datanodes weren't available fails when I try to close the same again when the datanodes are available? Could someone kindly help me to tackle this situation?
Thanks Pallavi -----Original Message----- From: Palleti, Pallavi [mailto:[email protected]] Sent: Tuesday, July 21, 2009 10:21 PM To: [email protected] Subject: Issue with HDFS Client when datanode is temporarily unavailable Hi all, We are facing issues with an external application when it tries to write data into HDFS using FSDataOutputStream. We are using hadoop-0.18.2 version. The code works perfectly fine as long as the data nodes are doing well. If the data nodes are unavailable due to some reason (No space left etc, which is temporary due to map red jobs running on the machine), the code fails. I tried to fix the issue by catching the error and waiting for some time before retrying again. During this, I came to know that the actual writes are not happening when we specify out.write() (Even the same case with out.write() followed by out.flush()), but it happens when we actually specify out.close(). During this time, if the datanodes are unavailable, the DFSClient internally tries multiple times before actually throwing exception. Below are the sequence of exceptions that I am seeing. 09/07/21 19:33:25 INFO dfs.DFSClient: Exception in createBlockOutputStream java.net.ConnectException: Connection refused 09/07/21 19:33:25 INFO dfs.DFSClient: Abandoning block blk_2612177980121914843_134112 09/07/21 19:33:31 INFO dfs.DFSClient: Exception in createBlockOutputStream java.net.ConnectException: Connection refused 09/07/21 19:33:31 INFO dfs.DFSClient: Abandoning block blk_-3499389777806382640_134112 09/07/21 19:33:37 INFO dfs.DFSClient: Exception in createBlockOutputStream java.net.ConnectException: Connection refused 09/07/21 19:33:37 INFO dfs.DFSClient: Abandoning block blk_1835125657840860999_134112 09/07/21 19:33:43 INFO dfs.DFSClient: Exception in createBlockOutputStream java.net.ConnectException: Connection refused 09/07/21 19:33:43 INFO dfs.DFSClient: Abandoning block blk_-3979824251735502509_134112 [4 times attempt done by DFSClient before throwing exception during which datanode is unavailable] 09/07/21 19:33:49 WARN dfs.DFSClient: DataStreamer Exception: java.io.IOException: Unable to create new block. at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DF SClient.java:2357) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.ja va:1743) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClie nt.java:1920) 09/07/21 19:33:49 WARN dfs.DFSClient: Error Recovery for block blk_-3979824251735502509_134112 bad datanode[0] 09/07/21 19:33:49 ERROR logwriter.LogWriterToHDFSV2: Failed while creating file for data:some dummy line [21/Jul/2009:17:15:18 somethinghere] with other dummy info :to HDFS java.io.IOException: Could not get block locations. Aborting... at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFS Client.java:2151) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.ja va:1743) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClie nt.java:1897) 09/07/21 19:33:49 INFO logwriter.LogWriterToHDFSV2: Retrying again...number of Attempts =0 [done by me manually during which datanode is available] 09/07/21 19:33:54 ERROR logwriter.LogWriterToHDFSV2: Failed while creating file for data:some dummy line [21/Jul/2009:17:15:18 somethinghere] with other dummy info :to HDFS java.io.IOException: Could not get block locations. Aborting... at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFS Client.java:2151) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.ja va:1743) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClie nt.java:1897) 09/07/21 19:33:54 INFO logwriter.LogWriterToHDFSV2: Retrying again...number of Attempts =1 [done by me manually during which datanode is available] 09/07/21 19:33:59 ERROR logwriter.LogWriterToHDFSV2: Failed while creating file for data:some dummy line [21/Jul/2009:17:15:18 somethinghere] with other dummy info :to HDFS java.io.IOException: Could not get block locations. Aborting... at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFS Client.java:2151) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.ja va:1743) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClie nt.java:1897) 09/07/21 19:33:59 INFO logwriter.LogWriterToHDFSV2: Retrying again...number of Attempts =2 [done by me manually during which datanode is available] 09/07/21 19:34:04 ERROR logwriter.LogWriterToHDFSV2: Unexpected error while writing to HDFS, exiting ... So, if the writes are happening during close and if it fails because of unavailability of datanodes, next time, when I try to close the same stream, it is throwing exception even when the datanodes are available. Why is it failing when I am trying to close the stream again though the datanodes are available? Any idea how to handle the scenario? The only way that I can think of is to remember the position in the input file from where we started writing into new file in HDFS and seek to that position during failure and re-read the same and try to write back to HDFS. Could someone please tell me if there is a better option of handling these errors? Thanks Pallavi
