[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291078#comment-13291078
 ] 

Daryn Sharp commented on MAPREDUCE-4323:
----------------------------------------

In particular, {{DFSClient}} maintains a socket cache.  Closed sockets are not 
detected until another connection is needed, or the client is closed.  That's 
another issue, but the NM's failure to close filesystems for a user after the 
app completes causes a leak of sockets in the CLOSE_WAIT state that eventually 
exhaust fds for the process.

Calling {{FileSystem.closeAllForUGI}}, as the JT does, is troublesome that it 
may close the fs for other apps running as that user.  One approach is to 
partition the fs cache to allow each app to maintain its own cache of 
filesystems.  See HADOOP-8490 for possible approaches, which would allow the 
closing of the app's filesystems ala the JT.

Also note that failure to close filesystems causes all future jobs to use the 
configuration of the first job.  This will be very problematic, so it's 
imperative to ensure apps each get their own cached instances.
                
> NM leaks sockets
> ----------------
>
>                 Key: MAPREDUCE-4323
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4323
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 0.23.0, 0.24.0, 2.0.0-alpha
>            Reporter: Daryn Sharp
>            Priority: Critical
>
> The NM is exhausting its fds because it's not closing fs instances when the 
> app is finished.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to