[ 
https://issues.apache.org/jira/browse/HBASE-5673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13241065#comment-13241065
 ] 

xufeng commented on HBASE-5673:
-------------------------------

I found this issue in my cluster.

1.I found any regionserver call not report to master because sockettimeout.
{noformat}
[2012-03-26 14:48:09,815] [INFO ] [regionserver20020] 
[org.apache.hadoop.hbase.regionserver.HRegionServer 1469] Attempting connect to 
Master server at DDB03:20000
[2012-03-26 14:49:09,818] [INFO ] [regionserver20020] 
[org.apache.hadoop.ipc.HbaseRPC 360] Problem connecting to server: 
DDB03/192.168.28.53:20000
[2012-03-26 14:49:09,819] [WARN ] [regionserver20020] 
[org.apache.hadoop.hbase.regionserver.HRegionServer 1483] Unable to connect to 
master. Retrying. Error was:
java.net.SocketTimeoutException: Call to DDB03/192.168.28.53:20000 failed on 
socket timeout exception: java.net.SocketTimeoutException: 60000 millis timeout 
while waiting for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/192.168.28.53:59520 
remote=DDB03/192.168.28.53:20000]
{noformat}

2.through the jstack log of master,I found that one handle is waitting and 
others is blocked(waitForMeta).
{noformat}
。。。。。。。。。。。。
"IPC Server handler 90 on 20000" daemon prio=10 tid=0x00007f219c540000 
nid=0x4c3f in Object.wait() [0x00007f21963a7000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:485)
        at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:757)
        。。。。。。。。。

"IPC Server handler 87 on 20000" daemon prio=10 tid=0x00007f219c53a000 
nid=0x4c37 waiting for monitor entry [0x00007f21966aa000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at 
org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:397)
        - waiting to lock <0x0000000612486960> (a 
java.util.concurrent.atomic.AtomicBoolean)
        at 
org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMetaServerConnectionDefault(CatalogTracker.java:437)
。。。。。。。。。。。
{noformat}

3.I also ensure that the waitting handle cause the others blocked,the waitting 
handle is waitting for the call to complete.

4.But the unable to create new native thread” happened, the IOException can not 
caught it.
{noformat}
protected synchronized void setupIOstreams() throws IOException {
。。。。
        start();
      } catch (IOException e) {
        markClosed(e);
        close();

        throw e;
      }
。。。。。
{noformat}


5.thus the call will be lost in call queue and never to complete.
{noformat}
public Writable call(......)
{
......
    synchronized (call) {
      while (!call.done) {
        try {
          call.wait();                           // wait for the result
        } catch (InterruptedException ignored) {
          // save the fact that we were interrupted
          interrupted = true;
        }
      }
......
}

{noformat}
                
> The OOM problem of IPC client call  cause all handle block
> ----------------------------------------------------------
>
>                 Key: HBASE-5673
>                 URL: https://issues.apache.org/jira/browse/HBASE-5673
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.6
>         Environment: 0.90.6
>            Reporter: xufeng
>            Assignee: xufeng
>
> if HBaseClient meet "unable to create new native thread" exception, the call 
> will never complete because it be lost in calls queue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to