[ 
https://issues.apache.org/jira/browse/HBASE-9393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15334461#comment-15334461
 ] 

Vladimir Rodionov edited comment on HBASE-9393 at 6/16/16 7:06 PM:
-------------------------------------------------------------------

Some stats from env:

# 15 node cluster / 11 for HBase
# HBASE 1.1.2.x (HDP 2.4.2)
# Phoenix
# 5500 regions
# 2844 (!)  tables
# 3860 (!) snapshots
# Kerberos
# UI shows 3 dead regions, which are not dead
# They run M/R jobs (Phoenix), M/R jobs (with direct HFiles access), Storm and 
Flume application writes to HBase
# Number of files: archive - 7320, data - 43349, rest dirs are small
# All CLOSE_WAIT connections are bound to remote nodes on 1019 port (Kerberized 
DN?)
# Distribution of CLOSE_WAIT socket is very uneven: 011 server has 59K, 006 has 
less than 1K
# Enabling / disabling tables does not make any positive effect on # of 
CLOSE_WAIT connections. 

The number of CLOSE_WAIT connections grows steadily over time, it reached 60K 
on 011 node in less than 24 hours.

I hope this will give some clue. 


was (Author: vrodionov):
Some stats from env:

# 15 node cluster / 11 for HBase
# HBASE 1.1.2.x (HDP 2.4.2)
# Phoenix
# 5500 regions
# 2844 (!)  tables
# 3860 (!) snapshots
# Kerberos
# UI shows 3 dead regions, which are not dead
# They run M/R jobs (Phoenix), M/R jobs (with direct HFiles access), Storm and 
Flume application writes to HBase
# Number of files: archive - 7320, data - 43349, rest dirs are small
# All CLOSE_WAIT connections are bound to remote nodes on 1019 port (Kerberized 
DN?)
# Distribution of CLOSE_WAIT socket is very uneven: 011 server has 59K, 006 has 
less than 1K
# Enabling / disabling tables does not make any positive effect on # of 
CLOSE_WAIT connections. 

The number of CLOSE_WAIT connections grows steadily over time, it reached 60K 
on oo1 node in less than 24 hours.

I hope this will give some clue. 

> Hbase does not closing a closed socket resulting in many CLOSE_WAIT 
> --------------------------------------------------------------------
>
>                 Key: HBASE-9393
>                 URL: https://issues.apache.org/jira/browse/HBASE-9393
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.2, 0.98.0, 1.0.1.1, 1.1.2
>         Environment: Centos 6.4 - 7 regionservers/datanodes, 8 TB per node, 
> 7279 regions
>            Reporter: Avi Zrachya
>            Assignee: Ashish Singhi
>            Priority: Critical
>             Fix For: 2.0.0
>
>         Attachments: HBASE-9393.patch, HBASE-9393.v1.patch, 
> HBASE-9393.v10.patch, HBASE-9393.v11.patch, HBASE-9393.v12.patch, 
> HBASE-9393.v13.patch, HBASE-9393.v14.patch, HBASE-9393.v15.patch, 
> HBASE-9393.v15.patch, HBASE-9393.v2.patch, HBASE-9393.v3.patch, 
> HBASE-9393.v4.patch, HBASE-9393.v5.patch, HBASE-9393.v5.patch, 
> HBASE-9393.v5.patch, HBASE-9393.v6.patch, HBASE-9393.v6.patch, 
> HBASE-9393.v6.patch, HBASE-9393.v7.patch, HBASE-9393.v8.patch, 
> HBASE-9393.v9.patch
>
>
> HBase dose not close a dead connection with the datanode.
> This resulting in over 60K CLOSE_WAIT and at some point HBase can not connect 
> to the datanode because too many mapped sockets from one host to another on 
> the same port.
> The example below is with low CLOSE_WAIT count because we had to restart 
> hbase to solve the porblem, later in time it will incease to 60-100K sockets 
> on CLOSE_WAIT
> [root@hd2-region3 ~]# netstat -nap |grep CLOSE_WAIT |grep 21592 |wc -l
> 13156
> [root@hd2-region3 ~]# ps -ef |grep 21592
> root     17255 17219  0 12:26 pts/0    00:00:00 grep 21592
> hbase    21592     1 17 Aug29 ?        03:29:06 
> /usr/java/jdk1.6.0_26/bin/java -XX:OnOutOfMemoryError=kill -9 %p -Xmx8000m 
> -ea -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode 
> -Dhbase.log.dir=/var/log/hbase 
> -Dhbase.log.file=hbase-hbase-regionserver-hd2-region3.swnet.corp.log ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to