[jira] [Commented] (HDFS-4504) DFSOutputStream#close doesn't always release resources (such as leases)

Colin Patrick McCabe (JIRA) Tue, 20 Aug 2013 18:24:42 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13745691#comment-13745691
 ]

Colin Patrick McCabe commented on HDFS-4504:
--------------------------------------------

Vinay wrote:
bq. While handling the Zombie stream, ZombieStreamManager can report to 
NameNode via some new RPC as this stream is zombie.

A DFSOutputStream is a zombie for one of two reasons:
1. The client can't contact the NameNode (perhaps because of a network problem)
2. The client asked the NameNode to complete the file and it refused, because 
the NN does not (yet?) have a record that all of the file's blocks are present 
and complete.

In scenario #1, we can't tell the NameNode anything because we can't talk to it.

In scenario #2, the NameNode already knows everything it needs to know about 
the file.  It doesn't care whether we consider the file a zombie or not-- why 
would it?  All it knows is that the file isn't complete yet.

The big picture for this change is that we're trying to prevent a scenario 
where the DFSOutputStream is never closeable and leaks resources forever.  In 
order to do that, we sometimes have to make some unpleasant choices.  One of 
them is that if there was a data streamer failure, we complete the file anyway 
after a configurable time period (currently 2 minutes).  If you don't like this 
policy, you can just set the period so long that it corresponds to the lease 
recovery period.

As I said before, the current code doesn't do anything special in the case of a 
data streamer failure in DFSOutputStream#close.  It just throws up its hands 
and says "oh well, guess that data's gone!"  After the hard-lease period 
expires, we will complete the file anyway.  So it's exactly the same behavior 
with this patch as without it-- only the timeout is different.

It sounds like what you want to do is somehow "try harder" to fix the data 
streamer failure when you know the file is being closed.  This might be a good 
idea, but we should do it in a future JIRA.  This patch is big enough, and 
changes enough things already.

> DFSOutputStream#close doesn't always release resources (such as leases)
> -----------------------------------------------------------------------
>
>                 Key: HDFS-4504
>                 URL: https://issues.apache.org/jira/browse/HDFS-4504
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-4504.001.patch, HDFS-4504.002.patch, 
> HDFS-4504.007.patch, HDFS-4504.008.patch, HDFS-4504.009.patch, 
> HDFS-4504.010.patch, HDFS-4504.011.patch, HDFS-4504.014.patch, 
> HDFS-4504.015.patch
>
>
> {{DFSOutputStream#close}} can throw an {{IOException}} in some cases.  One 
> example is if there is a pipeline error and then pipeline recovery fails.  
> Unfortunately, in this case, some of the resources used by the 
> {{DFSOutputStream}} are leaked.  One particularly important resource is file 
> leases.
> So it's possible for a long-lived HDFS client, such as Flume, to write many 
> blocks to a file, but then fail to close it.  Unfortunately, the 
> {{LeaseRenewerThread}} inside the client will continue to renew the lease for 
> the "undead" file.  Future attempts to close the file will just rethrow the 
> previous exception, and no progress can be made by the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4504) DFSOutputStream#close doesn't always release resources (such as leases)

Reply via email to