[
https://issues.apache.org/jira/browse/HDFS-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13640985#comment-13640985
]
Colin Patrick McCabe commented on HDFS-4504:
--------------------------------------------
bq. Does this fully solve the problem, given that leases are per-client, not
per-file? ie so long as the long-lived client has any other open files for
write, it will keep calling renewLease() and the file will be stuck open and
un-recovered forever.
Thanks for pointing that out. I think this is a real problem with my current
patch and is likely to lead to the kind of scenario we've seen in the field in
the past, where a long-lived HDFS client program like Flume gets some files in
limbo after transient network problems during a close operation.
The only way around that I can think of is to keep around a list of
uncompleted, but closed files. The lease renewer thread can call complete on
them prior to renewing the lease with the NameNode.
> DFSOutputStream#close doesn't always release resources (such as leases)
> -----------------------------------------------------------------------
>
> Key: HDFS-4504
> URL: https://issues.apache.org/jira/browse/HDFS-4504
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Colin Patrick McCabe
> Assignee: Colin Patrick McCabe
> Attachments: HDFS-4504.001.patch, HDFS-4504.002.patch
>
>
> {{DFSOutputStream#close}} can throw an {{IOException}} in some cases. One
> example is if there is a pipeline error and then pipeline recovery fails.
> Unfortunately, in this case, some of the resources used by the
> {{DFSOutputStream}} are leaked. One particularly important resource is file
> leases.
> So it's possible for a long-lived HDFS client, such as Flume, to write many
> blocks to a file, but then fail to close it. Unfortunately, the
> {{LeaseRenewerThread}} inside the client will continue to renew the lease for
> the "undead" file. Future attempts to close the file will just rethrow the
> previous exception, and no progress can be made by the client.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira