[ 
https://issues.apache.org/jira/browse/HADOOP-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637162#action_12637162
 ] 

Raghu Angadi commented on HADOOP-4346:
--------------------------------------

The following shows relevant info from jmap for a datanode that had a lot fds 
open.

- {noformat}
#jmap with out full-GC. Includes stale objects:
# num of fds for the process : 5358

#java internal selectors
 117:          1780          42720  sun.nio.ch.Util$SelectorWrapper
 118:          1762          42288  sun.nio.ch.Util$SelectorWrapper$Closer

#Hadoop selectors
  93:          3026         121040  
org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool$SelectorInfo
 844:             1             40  
org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool$ProviderInfo

#Datanode threads 
  99:          2229         106992  org.apache.hadoop.dfs.DataNode$DataXceiver
{noformat}

- {noformat}
#jmap -histo:live immediately after the previous. This does a full-GC before 
counting.
#num of fds : 5187

  64:          1759          42216  sun.nio.ch.Util$SelectorWrapper
  65:          1759          42216  sun.nio.ch.Util$SelectorWrapper$Closer

 465:             4            160  
org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool$SelectorInfo
 772:             1             40  
org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool$ProviderInfo

 422:             4            192  org.apache.hadoop.dfs.DataNode$DataXceiver
{noformat} This shows that there is no fd leak in Hadoop's selector cache. DN 
has 4 threads doing I/O and there are 4 selectors. But there are a lot of java 
internal selectors open.

- {noformat}
# 'jmap -histo:live' bout 1 minute after the previous full-GC
#num of fds : 57

# There are no SelectorWrapper objects. All of these must have been closed.

 768:             1             40  
org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool$SelectorInfo

 730:             1             48  org.apache.hadoop.dfs.DataNode$DataXceiver
{noformat}

I will try to reproduced this myself and try out a patch for connect(). 


> Hadoop triggers a "soft" fd leak. 
> ----------------------------------
>
>                 Key: HADOOP-4346
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4346
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 0.17.0
>            Reporter: Raghu Angadi
>
> Starting with Hadoop-0.17, most of the network I/O uses non-blocking NIO 
> channels. Normal blocking reads and writes are handled by Hadoop and use our 
> own cache of selectors. This cache suites well for Hadoop where I/O often 
> occurs on many short lived threads. Number of fds consumed is proportional to 
> number of threads currently blocked. 
> If blocking I/O is done using java.*, Sun's implementation uses internal 
> per-thread selectors. These selectors are closed using {{sun.misc.Cleaner}}. 
> Looks like this cleaning is kind of like finalizers and tied to GC. This is 
> pretty ill suited if we have many threads that are short lived. Until GC 
> happens, number of these selectors keeps growing. Each selector consumes 3 
> fds.
> Though blocking read and write are handled by Hadoop, {{connect()}} is still 
> the default implementation that uses per-thread selector. 
> Koji helped a lot in tracking this. Some sections from 'jmap' output and 
> other info  Koji collected led to this suspicion and will include that in the 
> next comment.
> One solution might be to handle connect() also in Hadoop using our selectors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to