[ 
https://issues.apache.org/jira/browse/HBASE-11833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14113563#comment-14113563
 ] 

steven xu commented on HBASE-11833:
-----------------------------------

Guys, before create this issue, I have read the [HBASE-9393] and [HDFS-5671]. I 
found the patch code of these two Issues have added into Hadoop 2.4.0 tag in 
class BlockReaderFactory.getRemoteBlockReaderFromTcp().  So the [HBASE-9393] 
patch donot solve my problem. Another bug maybe lead my problem, so I created a 
new issue. Please check also. 
{code:title=Bar.java|borderStyle=solid}
// Some comments here
  private BlockReader getRemoteBlockReaderFromTcp() throws IOException {
    if (LOG.isTraceEnabled()) {
      LOG.trace(this + ": trying to create a remote block reader from a " +
          "TCP socket");
    }
    BlockReader blockReader = null;
    while (true) {
      BlockReaderPeer curPeer = null;
      Peer peer = null;
      try {
        curPeer = nextTcpPeer();
        if (curPeer == null) break;
        if (curPeer.fromCache) remainingCacheTries--;
        peer = curPeer.peer;
        blockReader = getRemoteBlockReader(peer);
        return blockReader;
      } catch (IOException ioe) {
        if (isSecurityException(ioe)) {
          if (LOG.isTraceEnabled()) {
            LOG.trace(this + ": got security exception while constructing " +
                "a remote block reader from " + peer, ioe);
          }
          throw ioe;
        }
        if ((curPeer != null) && curPeer.fromCache) {
          // Handle an I/O error we got when using a cached peer.  These are
          // considered less serious, because the underlying socket may be
          // stale.
          if (LOG.isDebugEnabled()) {
            LOG.debug("Closed potentially stale remote peer " + peer, ioe);
          }
        } else {
          // Handle an I/O error we got when using a newly created peer.
          LOG.warn("I/O error constructing remote block reader.", ioe);
          throw ioe;
        }
      } finally {
        if (blockReader == null) {
          IOUtils.cleanup(LOG, peer);
        }
      }
    }
    return null;
  }
{code}

> Hbase does not closing a closed socket resulting in thousand of CLOSE_WAIT 
> sockets
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-11833
>                 URL: https://issues.apache.org/jira/browse/HBASE-11833
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.98.0
>         Environment: RHEL 6.3 -HDP 2.1 -6 RegionServers/Datanode -18T per 
> node -3108Regions
>            Reporter: steven xu
>
> HBase dose not close a dead connection with the datanode.
> This resulting in over 30K+ CLOSE_WAIT and at some point HBase can not 
> connect to the datanode because too many mapped sockets from one host to 
> another on the same port:50010. 
> After I restart all RSs,  the count of CLOSE_WAIT will increase always.
> $ netstat -an|grep CLOSE_WAIT|wc -l
> 2545
> # netstat -nap|grep CLOSE_WAIT|grep 6569|wc -l
> 2545
> # ps -ef|grep 6569
> hbase     6569  6556 21 Aug25 ?        09:52:33 /opt/jdk1.6.0_25/bin/java 
> -Dproc_regionserver -XX:OnOutOfMemoryError=kill -9 %p -Xmx1000m 
> -XX:+UseConcMarkSweepGC 
> I aslo have reviewed these issues:
> [HBASE-9393]
> [HDFS-5671|https://issues.apache.org/jira/browse/HDFS-5671]
> [HDFS-1836|https://issues.apache.org/jira/browse/HDFS-1836]
> I found HBase 0.98/Hadoop 2.4.0 I uesed which source codes are not different 
> from these patches.
> But I donot understand why HBase 0.98/Hadoop 2.4.0 also have this isssue. 
> Please check. Thanks a lot.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to