[jira] [Commented] (HBASE-9393) Hbase does not closing a closed socket resulting in many CLOSE_WAIT

Sean Busbey (JIRA) Wed, 15 Jun 2016 09:00:06 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-9393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15331941#comment-15331941
 ]


Sean Busbey commented on HBASE-9393:
------------------------------------

{code}
diff --git 
a/hbase-server/src/main/java/org/apache/hadoop/hbase/io/FSDataInputStreamWrapper.java
 
b/hbase-server/src/main/java/org/apache/hadoop/hbase/io/FSDataInputStreamWrapper.java
index b06be6b..4f9e96e 100644
--- 
a/hbase-server/src/main/java/org/apache/hadoop/hbase/io/FSDataInputStreamWrapper.java
+++ 
b/hbase-server/src/main/java/org/apache/hadoop/hbase/io/FSDataInputStreamWrapper.java
...
+  private volatile Boolean instanceOfCanUnbuffer = null;
+  // Using reflection to get org.apache.hadoop.fs.CanUnbuffer#unbuffer method 
to avoid compilation
+  // errors against Hadoop pre 2.6.4 and 2.7.1 versions.
+  private volatile Method unbuffer = null;
+
{code}

AFAICT, these are only used in the {{unbuffer}} method. That method declares 
itself not threadsafe, so why are these volatile?

{code}
@@ -493,8 +499,8 @@ public class HFile {
    */
   
@edu.umd.cs.findbugs.annotations.SuppressWarnings(value="SF_SWITCH_FALLTHROUGH",
       justification="Intentional")
-  private static Reader pickReaderVersion(Path path, FSDataInputStreamWrapper 
fsdis,
-      long size, CacheConfig cacheConf, HFileSystem hfs, Configuration conf) 
throws IOException {
+  private static Reader openReader(Path path, FSDataInputStreamWrapper fsdis, 
long size,
+      CacheConfig cacheConf, HFileSystem hfs, Configuration conf) throws 
IOException {
     FixedFileTrailer trailer = null;
     try {
       boolean isHBaseChecksum = fsdis.shouldUseHBaseChecksum();
@@ -516,6 +522,8 @@ public class HFile {
         LOG.warn("Error closing fsdis FSDataInputStreamWrapper", t2);
       }
       throw new CorruptHFileException("Problem reading HFile Trailer from file 
" + path, t);
+    } finally {
+      fsdis.unbuffer();
     }
   }
{code}

The addition of the unbuffer call here means that we need to update the 
javadocs for {{HFile.createReader(FileSystem, Path, FSDataInputStreamWrapper, 
long, CacheConfig, Configuration)}} and {{HFile.createReaderFromStream(Path, 
FSDataInputStream, long, CacheConfig, Configuration)}} to note that callers 
need to ensure no other threads have access to the passed FSDISW instance.

We should also ensure that existing calls to those methods are safely passing 
the FSDISW instance.

{code}
diff --git 
a/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java 
b/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
index 6268f2e..a68900a 100644
--- 
a/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
+++ 
b/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
@@ -1351,6 +1351,12 @@ public class HFileBlock implements Cacheable {
 
     void setIncludesMemstoreTS(boolean includesMemstoreTS);
     void setDataBlockEncoder(HFileDataBlockEncoder encoder);
+
+    /**
+     * To close the stream's socket. Note: This can be concurrently called 
from multiple threads and
+     * implementation should take care of thread safety
+     */
+    void unbufferStream();
   }
 
   /**
@@ -1817,6 +1823,19 @@ public class HFileBlock implements Cacheable {
     public String toString() {
       return "hfs=" + hfs + ", path=" + pathName + ", fileContext=" + 
fileContext;
     }
+
+    @Override
+    public void unbufferStream() {
+      // To handle concurrent reads, ensure that no other client is accessing 
the streams while we
+      // unbuffer it.
+      if (streamLock.tryLock()) {
+        try {
+          this.streamWrapper.unbuffer();
+        } finally {
+          streamLock.unlock();
+        }
+      }
+    }
{code}

Just want to make sure I'm following the rationale correctly here. This won't 
actually take care of unbuffering if the lock is held e.g. for reading. I think 
this is fine, since it implies someone else is still using the stream and 
presumably they will also attempt to unbuffer when they are done.

> Hbase does not closing a closed socket resulting in many CLOSE_WAIT 
> --------------------------------------------------------------------
>
>                 Key: HBASE-9393
>                 URL: https://issues.apache.org/jira/browse/HBASE-9393
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.2, 0.98.0, 1.0.1.1, 1.1.2
>         Environment: Centos 6.4 - 7 regionservers/datanodes, 8 TB per node, 
> 7279 regions
>            Reporter: Avi Zrachya
>            Assignee: Ashish Singhi
>            Priority: Critical
>             Fix For: 2.0.0
>
>         Attachments: HBASE-9393.patch, HBASE-9393.v1.patch, 
> HBASE-9393.v10.patch, HBASE-9393.v11.patch, HBASE-9393.v12.patch, 
> HBASE-9393.v13.patch, HBASE-9393.v14.patch, HBASE-9393.v15.patch, 
> HBASE-9393.v15.patch, HBASE-9393.v2.patch, HBASE-9393.v3.patch, 
> HBASE-9393.v4.patch, HBASE-9393.v5.patch, HBASE-9393.v5.patch, 
> HBASE-9393.v5.patch, HBASE-9393.v6.patch, HBASE-9393.v6.patch, 
> HBASE-9393.v6.patch, HBASE-9393.v7.patch, HBASE-9393.v8.patch, 
> HBASE-9393.v9.patch
>
>
> HBase dose not close a dead connection with the datanode.
> This resulting in over 60K CLOSE_WAIT and at some point HBase can not connect 
> to the datanode because too many mapped sockets from one host to another on 
> the same port.
> The example below is with low CLOSE_WAIT count because we had to restart 
> hbase to solve the porblem, later in time it will incease to 60-100K sockets 
> on CLOSE_WAIT
> [root@hd2-region3 ~]# netstat -nap |grep CLOSE_WAIT |grep 21592 |wc -l
> 13156
> [root@hd2-region3 ~]# ps -ef |grep 21592
> root     17255 17219  0 12:26 pts/0    00:00:00 grep 21592
> hbase    21592     1 17 Aug29 ?        03:29:06 
> /usr/java/jdk1.6.0_26/bin/java -XX:OnOutOfMemoryError=kill -9 %p -Xmx8000m 
> -ea -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode 
> -Dhbase.log.dir=/var/log/hbase 
> -Dhbase.log.file=hbase-hbase-regionserver-hd2-region3.swnet.corp.log ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-9393) Hbase does not closing a closed socket resulting in many CLOSE_WAIT

Reply via email to