[ 
https://issues.apache.org/jira/browse/HDFS-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13734151#comment-13734151
 ] 

Colin Patrick McCabe commented on HDFS-4504:
--------------------------------------------

This patch creates a background thread for handling uncloseable files.  Streams 
get placed into the {{ZombieStreamManager}} when close() fails to contact the 
NameNode.  It uses an {{ExecutorService}}, so the OS thread will be properly 
disposed of when it's not in use.

The client can figure out when the file is closed on the NameNode by polling 
{{DFSOutputStream#close}}.  When the lease recovery succeeds, 
{{DFSOutputStream#close}} will stop throwing an {{IOException}}.  At that 
point, the client can re-open that file if it wishes.  This is a lot better 
than the current situation, where the client doesn't know when, or if, the file 
will ever be safe to re-open.

{{TestHdfsClose}} test a few different cases: all of the DataNodes going down, 
all of the NameNodes going down, and the client calling 
{{DistributedFileSystem#abort}}.  In every case, we should be able to keep 
going after an error and not run into uncloseable files.
                
> DFSOutputStream#close doesn't always release resources (such as leases)
> -----------------------------------------------------------------------
>
>                 Key: HDFS-4504
>                 URL: https://issues.apache.org/jira/browse/HDFS-4504
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-4504.001.patch, HDFS-4504.002.patch, 
> HDFS-4504.007.patch
>
>
> {{DFSOutputStream#close}} can throw an {{IOException}} in some cases.  One 
> example is if there is a pipeline error and then pipeline recovery fails.  
> Unfortunately, in this case, some of the resources used by the 
> {{DFSOutputStream}} are leaked.  One particularly important resource is file 
> leases.
> So it's possible for a long-lived HDFS client, such as Flume, to write many 
> blocks to a file, but then fail to close it.  Unfortunately, the 
> {{LeaseRenewerThread}} inside the client will continue to renew the lease for 
> the "undead" file.  Future attempts to close the file will just rethrow the 
> previous exception, and no progress can be made by the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to