steven xu created HDFS-6973:
-------------------------------
Summary: DFSClient does not closing a closed socket resulting in
thousand of CLOSE_WAIT sockets
Key: HDFS-6973
URL: https://issues.apache.org/jira/browse/HDFS-6973
Project: Hadoop HDFS
Issue Type: Bug
Components: hdfs-client
Affects Versions: 2.4.0
Environment: RHEL 6.3 -HDP 2.1 -6 RegionServers/Datanode -18T per node
-3108Regions
Reporter: steven xu
HBase as HDFS Client dose not close a dead connection with the datanode.
This resulting in over 30K+ CLOSE_WAIT and at some point HBase can not connect
to the datanode because too many mapped sockets from one host to another on the
same port:50010.
After I restart all RSs, the count of CLOSE_WAIT will increase always.
$ netstat -an|grep CLOSE_WAIT|wc -l
2545
netstat -nap|grep CLOSE_WAIT|grep 6569|wc -l
2545
ps -ef|grep 6569
hbase 6569 6556 21 Aug25 ? 09:52:33 /opt/jdk1.6.0_25/bin/java
-Dproc_regionserver -XX:OnOutOfMemoryError=kill -9 %p -Xmx1000m
-XX:+UseConcMarkSweepGC
I aslo have reviewed these issues:
[HDFS-5697]
[HDFS-5671]
[HDFS-1836]
[HBASE-9393]
I found in HBase 0.98/Hadoop 2.4.0 source codes of these patchs have been added.
But I donot understand why HBase 0.98/Hadoop 2.4.0 also have this isssue.
Please check. Thanks a lot.
These codes have been added into
BlockReaderFactory.getRemoteBlockReaderFromTcp(). Another bug maybe lead my
problem,
{code:title=BlockReaderFactory.java|borderStyle=solid}
// Some comments here
private BlockReader getRemoteBlockReaderFromTcp() throws IOException {
if (LOG.isTraceEnabled()) {
LOG.trace(this + ": trying to create a remote block reader from a " +
"TCP socket");
}
BlockReader blockReader = null;
while (true) {
BlockReaderPeer curPeer = null;
Peer peer = null;
try {
curPeer = nextTcpPeer();
if (curPeer == null) break;
if (curPeer.fromCache) remainingCacheTries--;
peer = curPeer.peer;
blockReader = getRemoteBlockReader(peer);
return blockReader;
} catch (IOException ioe) {
if (isSecurityException(ioe)) {
if (LOG.isTraceEnabled()) {
LOG.trace(this + ": got security exception while constructing " +
"a remote block reader from " + peer, ioe);
}
throw ioe;
}
if ((curPeer != null) && curPeer.fromCache) {
// Handle an I/O error we got when using a cached peer. These are
// considered less serious, because the underlying socket may be
// stale.
if (LOG.isDebugEnabled()) {
LOG.debug("Closed potentially stale remote peer " + peer, ioe);
}
} else {
// Handle an I/O error we got when using a newly created peer.
LOG.warn("I/O error constructing remote block reader.", ioe);
throw ioe;
}
} finally {
if (blockReader == null) {
IOUtils.cleanup(LOG, peer);
}
}
}
return null;
}
{code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)