[ https://issues.apache.org/jira/browse/HADOOP-2757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704010#action_12704010 ]
dhruba borthakur commented on HADOOP-2757: ------------------------------------------ You are referring to dfs.datanode.socket.write.timeout. These are configurable parameters and I already set them to an appropriate number, e.g. 20 seconds because I want real-timeish behaviour. If all the datanode(s) in the pipeline die, then the client detects an error and aborts. That is intended behaviour. If one datanode is not really dead (but hangs), then the client will hang too. This patch does not fix that problem. The main motivation for this patch is to detect namenode failures early. If a client is writing to a block, it might take a while for the block to get filled up.... this time is dependent at the rate at which the client is writing data... if the client is trickling data into the block, it will not experience the dfs.datanode.socket.write.timeout timeout for a while. In the existing code in trunk, the lease recovery thread will detect NN problem after a while but it does nothing to terminate the threads that were writing to the block. The patch does this. > Should DFS outputstream's close wait forever? > --------------------------------------------- > > Key: HADOOP-2757 > URL: https://issues.apache.org/jira/browse/HADOOP-2757 > Project: Hadoop Core > Issue Type: Improvement > Components: dfs > Reporter: Raghu Angadi > Assignee: dhruba borthakur > Attachments: softMount1.patch, softMount1.patch, softMount2.patch > > > Currently {{DFSOutputStream.close()}} waits for ever if Namenode keeps > throwing {{NotYetReplicated}} exception, for whatever reason. Its pretty > annoying for a user. Shoud the loop inside close have a timeout? If so how > much? It could probably something like 10 minutes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.