[ 
https://issues.apache.org/jira/browse/HADOOP-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lohit Vijayarenu updated HADOOP-3681:
-------------------------------------

    Attachment: HADOOP-3681-1.patch

Thanks koji. I was also able to reproduce this by throwing exception after 
locateFollowingBlock. Looks like this is what happened
- DFSClient timed out getting a new block from namenode, while namenode was 
busy. But in this case, namenode did allocate a block on behalf of the client.
- This raised an exception and locateFollowingBlock returned exception 
eventually closing streamer
- now closeInternal went pass isClosed() and was trying to complete the file. 
- namenode had a connection to client and so, did not expire the lease. 

Suggested fix is to call isClosed() while trying to complete the file. I tested 
this manually and it throws the exception stored in lastException and 
terminates the client. 

> Infinite loop in dfs close
> --------------------------
>
>                 Key: HADOOP-3681
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3681
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Koji Noguchi
>         Attachments: H-3681-jstack.txt, HADOOP-3681-1.patch
>
>
> We had dfsClient -put  hang outputting 
> {noformat}
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: DataStreamer 
> Exception: java.net.SocketTimeoutException:
> timed out waiting for rpc response
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: Error Recovery 
> for block null bad datanode[0]
> 2008-06-28 10:05:51,067 INFO org.apache.hadoop.dfs.DFSClient: Could not 
> complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:52,898 INFO org.apache.hadoop.dfs.DFSClient: Could not 
> complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:54,893 INFO org.apache.hadoop.dfs.DFSClient: Could not 
> complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:56,920 INFO org.apache.hadoop.dfs.DFSClient: Could not 
> complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:57,765 INFO org.apache.hadoop.dfs.DFSClient: Could not 
> complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:58,199 INFO org.apache.hadoop.dfs.DFSClient: Could not 
> complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> [repeats forever]
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to