[ https://issues.apache.org/jira/browse/HDFS-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13734151#comment-13734151 ]
Colin Patrick McCabe commented on HDFS-4504: -------------------------------------------- This patch creates a background thread for handling uncloseable files. Streams get placed into the {{ZombieStreamManager}} when close() fails to contact the NameNode. It uses an {{ExecutorService}}, so the OS thread will be properly disposed of when it's not in use. The client can figure out when the file is closed on the NameNode by polling {{DFSOutputStream#close}}. When the lease recovery succeeds, {{DFSOutputStream#close}} will stop throwing an {{IOException}}. At that point, the client can re-open that file if it wishes. This is a lot better than the current situation, where the client doesn't know when, or if, the file will ever be safe to re-open. {{TestHdfsClose}} test a few different cases: all of the DataNodes going down, all of the NameNodes going down, and the client calling {{DistributedFileSystem#abort}}. In every case, we should be able to keep going after an error and not run into uncloseable files. > DFSOutputStream#close doesn't always release resources (such as leases) > ----------------------------------------------------------------------- > > Key: HDFS-4504 > URL: https://issues.apache.org/jira/browse/HDFS-4504 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Colin Patrick McCabe > Assignee: Colin Patrick McCabe > Attachments: HDFS-4504.001.patch, HDFS-4504.002.patch, > HDFS-4504.007.patch > > > {{DFSOutputStream#close}} can throw an {{IOException}} in some cases. One > example is if there is a pipeline error and then pipeline recovery fails. > Unfortunately, in this case, some of the resources used by the > {{DFSOutputStream}} are leaked. One particularly important resource is file > leases. > So it's possible for a long-lived HDFS client, such as Flume, to write many > blocks to a file, but then fail to close it. Unfortunately, the > {{LeaseRenewerThread}} inside the client will continue to renew the lease for > the "undead" file. Future attempts to close the file will just rethrow the > previous exception, and no progress can be made by the client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira