[
https://issues.apache.org/jira/browse/HADOOP-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lohit Vijayarenu updated HADOOP-3681:
-------------------------------------
Attachment: HADOOP-3681-1.patch
Thanks koji. I was also able to reproduce this by throwing exception after
locateFollowingBlock. Looks like this is what happened
- DFSClient timed out getting a new block from namenode, while namenode was
busy. But in this case, namenode did allocate a block on behalf of the client.
- This raised an exception and locateFollowingBlock returned exception
eventually closing streamer
- now closeInternal went pass isClosed() and was trying to complete the file.
- namenode had a connection to client and so, did not expire the lease.
Suggested fix is to call isClosed() while trying to complete the file. I tested
this manually and it throws the exception stored in lastException and
terminates the client.
> Infinite loop in dfs close
> --------------------------
>
> Key: HADOOP-3681
> URL: https://issues.apache.org/jira/browse/HADOOP-3681
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.17.0
> Reporter: Koji Noguchi
> Attachments: H-3681-jstack.txt, HADOOP-3681-1.patch
>
>
> We had dfsClient -put hang outputting
> {noformat}
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: DataStreamer
> Exception: java.net.SocketTimeoutException:
> timed out waiting for rpc response
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: Error Recovery
> for block null bad datanode[0]
> 2008-06-28 10:05:51,067 INFO org.apache.hadoop.dfs.DFSClient: Could not
> complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:52,898 INFO org.apache.hadoop.dfs.DFSClient: Could not
> complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:54,893 INFO org.apache.hadoop.dfs.DFSClient: Could not
> complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:56,920 INFO org.apache.hadoop.dfs.DFSClient: Could not
> complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:57,765 INFO org.apache.hadoop.dfs.DFSClient: Could not
> complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:58,199 INFO org.apache.hadoop.dfs.DFSClient: Could not
> complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> [repeats forever]
> {noformat}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.