[jira] [Commented] (HBASE-9393) Hbase does not closing a closed socket resulting in many CLOSE_WAIT

Sean Busbey (JIRA) Sat, 13 Feb 2016 09:34:07 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-9393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15146079#comment-15146079
 ]


Sean Busbey commented on HBASE-9393:
------------------------------------

Anoop:
{quote}
We will get this fix in for 2.0 only. (As there we have hadoop 2.7.0+ version 
by default).
Users on older version seeing this issue can get it fixed by upping their 
hadoop version in client side to 2.7.0 at least and apply this patch. 
Branch-1 is on hadoop 2.5.x only by default. So unless our default version is 
not upped there, there is no point in adding the fix there. 
Can open a backport jira when applicable.
{quote}

Ashish:
{quote}
Attached patch addressing review comment.
Thanks for all the offline discussion on this, Anoop & Ram.
For now this issue will be fixed only for 2.0.0. Once we plan to up our hadoop 
version to 2.7.x+ in any our branch code we can fix the issue there also as 
part of a back port jira.
{quote}

Okay, now I am even more -1 on this patch.

First of all, please make sure discussions get reflected in a public place 
(like this jira or dev@). Not just the decision, but the reasoning is important 
so that others can chime in.

Requiring our users to have the Hadoop 2.7.1 client libraries in their 
deployment is a terrible user experience and idea. 1) AFAICT we have not had 
any discussion nor decision to change our "supported Hadoop versions" in HBase 
2.0 to only be Hadoop 2.7.1+. 2) I do not trust Hadoop to have the 2.7.1 
clients work reliably across the HDFS server versions we support now and in the 
future. 3) we expressly tell folks in our operational guides that they should 
replace the hadoop jars we ship with those for their actual Hadoop 
distribution. This patch goes directly counter to that advice and does not 
change it (nor should it change, see #1 and #2).

The fact that Unbuffer is Private / Evolving in the Hadoop code base just makes 
the above more severe. We risk ending up beached on a very tight range of 
possible Hadoop client library versions.



> Hbase does not closing a closed socket resulting in many CLOSE_WAIT 
> --------------------------------------------------------------------
>
>                 Key: HBASE-9393
>                 URL: https://issues.apache.org/jira/browse/HBASE-9393
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.2, 0.98.0
>         Environment: Centos 6.4 - 7 regionservers/datanodes, 8 TB per node, 
> 7279 regions
>            Reporter: Avi Zrachya
>            Assignee: Ashish Singhi
>            Priority: Critical
>             Fix For: 2.0.0
>
>         Attachments: HBASE-9393.patch, HBASE-9393.v1.patch, 
> HBASE-9393.v2.patch, HBASE-9393.v3.patch, HBASE-9393.v4.patch, 
> HBASE-9393.v5.patch, HBASE-9393.v5.patch, HBASE-9393.v5.patch, 
> HBASE-9393.v6.patch, HBASE-9393.v6.patch, HBASE-9393.v6.patch
>
>
> HBase dose not close a dead connection with the datanode.
> This resulting in over 60K CLOSE_WAIT and at some point HBase can not connect 
> to the datanode because too many mapped sockets from one host to another on 
> the same port.
> The example below is with low CLOSE_WAIT count because we had to restart 
> hbase to solve the porblem, later in time it will incease to 60-100K sockets 
> on CLOSE_WAIT
> [root@hd2-region3 ~]# netstat -nap |grep CLOSE_WAIT |grep 21592 |wc -l
> 13156
> [root@hd2-region3 ~]# ps -ef |grep 21592
> root     17255 17219  0 12:26 pts/0    00:00:00 grep 21592
> hbase    21592     1 17 Aug29 ?        03:29:06 
> /usr/java/jdk1.6.0_26/bin/java -XX:OnOutOfMemoryError=kill -9 %p -Xmx8000m 
> -ea -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode 
> -Dhbase.log.dir=/var/log/hbase 
> -Dhbase.log.file=hbase-hbase-regionserver-hd2-region3.swnet.corp.log ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-9393) Hbase does not closing a closed socket resulting in many CLOSE_WAIT

Reply via email to