[jira] [Commented] (HBASE-9393) Hbase does not closing a closed socket resulting in many CLOSE_WAIT

Sean Busbey (JIRA) Thu, 25 Feb 2016 08:13:55 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-9393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15167369#comment-15167369
 ]


Sean Busbey commented on HBASE-9393:
------------------------------------

{quote}
bq. Why are we making these assignments indirectly via methods?
To handle the findbugs warnings ST_WRITE_TO_STATIC_FROM_INSTANCE_METHOD: Write 
to static field from instance method.
{quote}

But these methods have the same underlying problem: we're unsafely updating 
from multiple instance locations.

{quote}
bq. This doesn't look like it will behave correctly in presence of concurrency. 
Can we do the reflection set up during a static initializer?
Intial plan was to do that way and do this set up only once (which we still do 
only once in the patch) but then the stream is a instance object so could not 
do that way. Regarding concurrency we have handled that, the parallel reads 
will come through HFileBlock reads and there we acquiring the stream lock and 
then call the stream unbuffer, same as we do the for the block reads.
{quote}

There's nothing in this class that lets anyone know that this method must be 
accessed within a lock. The top level class description even claims that it is 
threadsafe for normal operations. Additionally, this change requires us to lock 
across *all* instances rather than just one as the current non-threadsafe 
portions do.

Are the characteristics of {{stream}} something we can determine in a static 
initializer from configs, perhaps by instantiating a dummy version?

{code}
FSDataInputStream stream = this.useHBaseChecksum ? this.streamNoFsChecksum : 
this.stream;
{code}

can useHBaseChecksum vary amongst instances in the same JVM? if it can, then we 
shouldn't be sharing cached information about the unbuffer call at all (or we 
need to have one of them per streamClass). That would solve the "assign to 
static" business by moving to per-instance caches of the reflection information.



> Hbase does not closing a closed socket resulting in many CLOSE_WAIT 
> --------------------------------------------------------------------
>
>                 Key: HBASE-9393
>                 URL: https://issues.apache.org/jira/browse/HBASE-9393
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.2, 0.98.0
>         Environment: Centos 6.4 - 7 regionservers/datanodes, 8 TB per node, 
> 7279 regions
>            Reporter: Avi Zrachya
>            Assignee: Ashish Singhi
>            Priority: Critical
>             Fix For: 2.0.0
>
>         Attachments: HBASE-9393.patch, HBASE-9393.v1.patch, 
> HBASE-9393.v10.patch, HBASE-9393.v11.patch, HBASE-9393.v12.patch, 
> HBASE-9393.v13.patch, HBASE-9393.v2.patch, HBASE-9393.v3.patch, 
> HBASE-9393.v4.patch, HBASE-9393.v5.patch, HBASE-9393.v5.patch, 
> HBASE-9393.v5.patch, HBASE-9393.v6.patch, HBASE-9393.v6.patch, 
> HBASE-9393.v6.patch, HBASE-9393.v7.patch, HBASE-9393.v8.patch, 
> HBASE-9393.v9.patch
>
>
> HBase dose not close a dead connection with the datanode.
> This resulting in over 60K CLOSE_WAIT and at some point HBase can not connect 
> to the datanode because too many mapped sockets from one host to another on 
> the same port.
> The example below is with low CLOSE_WAIT count because we had to restart 
> hbase to solve the porblem, later in time it will incease to 60-100K sockets 
> on CLOSE_WAIT
> [root@hd2-region3 ~]# netstat -nap |grep CLOSE_WAIT |grep 21592 |wc -l
> 13156
> [root@hd2-region3 ~]# ps -ef |grep 21592
> root     17255 17219  0 12:26 pts/0    00:00:00 grep 21592
> hbase    21592     1 17 Aug29 ?        03:29:06 
> /usr/java/jdk1.6.0_26/bin/java -XX:OnOutOfMemoryError=kill -9 %p -Xmx8000m 
> -ea -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode 
> -Dhbase.log.dir=/var/log/hbase 
> -Dhbase.log.file=hbase-hbase-regionserver-hd2-region3.swnet.corp.log ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-9393) Hbase does not closing a closed socket resulting in many CLOSE_WAIT

Reply via email to