[
https://issues.apache.org/jira/browse/HBASE-11833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14113563#comment-14113563
]
steven xu commented on HBASE-11833:
-----------------------------------
Guys, before create this issue, I have read the [HBASE-9393] and [HDFS-5671]. I
found the patch code of these two Issues have added into Hadoop 2.4.0 tag in
class BlockReaderFactory.getRemoteBlockReaderFromTcp(). So the [HBASE-9393]
patch donot solve my problem. Another bug maybe lead my problem, so I created a
new issue. Please check also.
{code:title=Bar.java|borderStyle=solid}
// Some comments here
private BlockReader getRemoteBlockReaderFromTcp() throws IOException {
if (LOG.isTraceEnabled()) {
LOG.trace(this + ": trying to create a remote block reader from a " +
"TCP socket");
}
BlockReader blockReader = null;
while (true) {
BlockReaderPeer curPeer = null;
Peer peer = null;
try {
curPeer = nextTcpPeer();
if (curPeer == null) break;
if (curPeer.fromCache) remainingCacheTries--;
peer = curPeer.peer;
blockReader = getRemoteBlockReader(peer);
return blockReader;
} catch (IOException ioe) {
if (isSecurityException(ioe)) {
if (LOG.isTraceEnabled()) {
LOG.trace(this + ": got security exception while constructing " +
"a remote block reader from " + peer, ioe);
}
throw ioe;
}
if ((curPeer != null) && curPeer.fromCache) {
// Handle an I/O error we got when using a cached peer. These are
// considered less serious, because the underlying socket may be
// stale.
if (LOG.isDebugEnabled()) {
LOG.debug("Closed potentially stale remote peer " + peer, ioe);
}
} else {
// Handle an I/O error we got when using a newly created peer.
LOG.warn("I/O error constructing remote block reader.", ioe);
throw ioe;
}
} finally {
if (blockReader == null) {
IOUtils.cleanup(LOG, peer);
}
}
}
return null;
}
{code}
> Hbase does not closing a closed socket resulting in thousand of CLOSE_WAIT
> sockets
> ----------------------------------------------------------------------------------
>
> Key: HBASE-11833
> URL: https://issues.apache.org/jira/browse/HBASE-11833
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Affects Versions: 0.98.0
> Environment: RHEL 6.3 -HDP 2.1 -6 RegionServers/Datanode -18T per
> node -3108Regions
> Reporter: steven xu
>
> HBase dose not close a dead connection with the datanode.
> This resulting in over 30K+ CLOSE_WAIT and at some point HBase can not
> connect to the datanode because too many mapped sockets from one host to
> another on the same port:50010.
> After I restart all RSs, the count of CLOSE_WAIT will increase always.
> $ netstat -an|grep CLOSE_WAIT|wc -l
> 2545
> # netstat -nap|grep CLOSE_WAIT|grep 6569|wc -l
> 2545
> # ps -ef|grep 6569
> hbase 6569 6556 21 Aug25 ? 09:52:33 /opt/jdk1.6.0_25/bin/java
> -Dproc_regionserver -XX:OnOutOfMemoryError=kill -9 %p -Xmx1000m
> -XX:+UseConcMarkSweepGC
> I aslo have reviewed these issues:
> [HBASE-9393]
> [HDFS-5671|https://issues.apache.org/jira/browse/HDFS-5671]
> [HDFS-1836|https://issues.apache.org/jira/browse/HDFS-1836]
> I found HBase 0.98/Hadoop 2.4.0 I uesed which source codes are not different
> from these patches.
> But I donot understand why HBase 0.98/Hadoop 2.4.0 also have this isssue.
> Please check. Thanks a lot.
--
This message was sent by Atlassian JIRA
(v6.2#6252)