[
https://issues.apache.org/jira/browse/HDFS-6973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110999#comment-15110999
]
Vinayakumar B commented on HDFS-6973:
-------------------------------------
HBASE-9393, is due to unclosed streams maintained in hbase for later reads.
number of CLOSE_WAITs is same as number of streams kept open.
When the stream is re-used for reading, corresponding CLOSE_WAIT will get
closed and read will happen by opening new connection.
So IMO, this is not a problem. As already suggested in HBASE-9393, if want to
keep the stream open, without keeping the socket open,
FSDataInputStrean#unbuffer() can be called after reading to close the block
readers.
> DFSClient does not closing a closed socket resulting in thousand of
> CLOSE_WAIT sockets
> --------------------------------------------------------------------------------------
>
> Key: HDFS-6973
> URL: https://issues.apache.org/jira/browse/HDFS-6973
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: hdfs-client
> Affects Versions: 2.4.0
> Environment: RHEL 6.3 -HDP 2.1 -6 RegionServers/Datanode -18T per
> node -3108Regions
> Reporter: steven xu
>
> HBase as HDFS Client dose not close a dead connection with the datanode.
> This resulting in over 30K+ CLOSE_WAIT and at some point HBase can not
> connect to the datanode because too many mapped sockets from one host to
> another on the same port:50010.
> After I restart all RSs, the count of CLOSE_WAIT will increase always.
> $ netstat -an|grep CLOSE_WAIT|wc -l
> 2545
> netstat -nap|grep CLOSE_WAIT|grep 6569|wc -l
> 2545
> ps -ef|grep 6569
> hbase 6569 6556 21 Aug25 ? 09:52:33 /opt/jdk1.6.0_25/bin/java
> -Dproc_regionserver -XX:OnOutOfMemoryError=kill -9 %p -Xmx1000m
> -XX:+UseConcMarkSweepGC
> I aslo have reviewed these issues:
> [HDFS-5697]
> [HDFS-5671]
> [HDFS-1836]
> [HBASE-9393]
> I found in HBase 0.98/Hadoop 2.4.0 source codes of these patchs have been
> added.
> But I donot understand why HBase 0.98/Hadoop 2.4.0 also have this isssue.
> Please check. Thanks a lot.
> These codes have been added into
> BlockReaderFactory.getRemoteBlockReaderFromTcp(). Another bug maybe lead my
> problem,
> {code:title=BlockReaderFactory.java|borderStyle=solid}
> // Some comments here
> private BlockReader getRemoteBlockReaderFromTcp() throws IOException {
> if (LOG.isTraceEnabled()) {
> LOG.trace(this + ": trying to create a remote block reader from a " +
> "TCP socket");
> }
> BlockReader blockReader = null;
> while (true) {
> BlockReaderPeer curPeer = null;
> Peer peer = null;
> try {
> curPeer = nextTcpPeer();
> if (curPeer == null) break;
> if (curPeer.fromCache) remainingCacheTries--;
> peer = curPeer.peer;
> blockReader = getRemoteBlockReader(peer);
> return blockReader;
> } catch (IOException ioe) {
> if (isSecurityException(ioe)) {
> if (LOG.isTraceEnabled()) {
> LOG.trace(this + ": got security exception while constructing " +
> "a remote block reader from " + peer, ioe);
> }
> throw ioe;
> }
> if ((curPeer != null) && curPeer.fromCache) {
> // Handle an I/O error we got when using a cached peer. These are
> // considered less serious, because the underlying socket may be
> // stale.
> if (LOG.isDebugEnabled()) {
> LOG.debug("Closed potentially stale remote peer " + peer, ioe);
> }
> } else {
> // Handle an I/O error we got when using a newly created peer.
> LOG.warn("I/O error constructing remote block reader.", ioe);
> throw ioe;
> }
> } finally {
> if (blockReader == null) {
> IOUtils.cleanup(LOG, peer);
> }
> }
> }
> return null;
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)