steven xu created HDFS-6973: ------------------------------- Summary: DFSClient does not closing a closed socket resulting in thousand of CLOSE_WAIT sockets Key: HDFS-6973 URL: https://issues.apache.org/jira/browse/HDFS-6973 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.4.0 Environment: RHEL 6.3 -HDP 2.1 -6 RegionServers/Datanode -18T per node -3108Regions Reporter: steven xu
HBase as HDFS Client dose not close a dead connection with the datanode. This resulting in over 30K+ CLOSE_WAIT and at some point HBase can not connect to the datanode because too many mapped sockets from one host to another on the same port:50010. After I restart all RSs, the count of CLOSE_WAIT will increase always. $ netstat -an|grep CLOSE_WAIT|wc -l 2545 netstat -nap|grep CLOSE_WAIT|grep 6569|wc -l 2545 ps -ef|grep 6569 hbase 6569 6556 21 Aug25 ? 09:52:33 /opt/jdk1.6.0_25/bin/java -Dproc_regionserver -XX:OnOutOfMemoryError=kill -9 %p -Xmx1000m -XX:+UseConcMarkSweepGC I aslo have reviewed these issues: [HDFS-5697] [HDFS-5671] [HDFS-1836] [HBASE-9393] I found in HBase 0.98/Hadoop 2.4.0 source codes of these patchs have been added. But I donot understand why HBase 0.98/Hadoop 2.4.0 also have this isssue. Please check. Thanks a lot. These codes have been added into BlockReaderFactory.getRemoteBlockReaderFromTcp(). Another bug maybe lead my problem, {code:title=BlockReaderFactory.java|borderStyle=solid} // Some comments here private BlockReader getRemoteBlockReaderFromTcp() throws IOException { if (LOG.isTraceEnabled()) { LOG.trace(this + ": trying to create a remote block reader from a " + "TCP socket"); } BlockReader blockReader = null; while (true) { BlockReaderPeer curPeer = null; Peer peer = null; try { curPeer = nextTcpPeer(); if (curPeer == null) break; if (curPeer.fromCache) remainingCacheTries--; peer = curPeer.peer; blockReader = getRemoteBlockReader(peer); return blockReader; } catch (IOException ioe) { if (isSecurityException(ioe)) { if (LOG.isTraceEnabled()) { LOG.trace(this + ": got security exception while constructing " + "a remote block reader from " + peer, ioe); } throw ioe; } if ((curPeer != null) && curPeer.fromCache) { // Handle an I/O error we got when using a cached peer. These are // considered less serious, because the underlying socket may be // stale. if (LOG.isDebugEnabled()) { LOG.debug("Closed potentially stale remote peer " + peer, ioe); } } else { // Handle an I/O error we got when using a newly created peer. LOG.warn("I/O error constructing remote block reader.", ioe); throw ioe; } } finally { if (blockReader == null) { IOUtils.cleanup(LOG, peer); } } } return null; } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)