[ https://issues.apache.org/jira/browse/HBASE-5673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13241065#comment-13241065 ]
xufeng commented on HBASE-5673: ------------------------------- I found this issue in my cluster. 1.I found any regionserver call not report to master because sockettimeout. {noformat} [2012-03-26 14:48:09,815] [INFO ] [regionserver20020] [org.apache.hadoop.hbase.regionserver.HRegionServer 1469] Attempting connect to Master server at DDB03:20000 [2012-03-26 14:49:09,818] [INFO ] [regionserver20020] [org.apache.hadoop.ipc.HbaseRPC 360] Problem connecting to server: DDB03/192.168.28.53:20000 [2012-03-26 14:49:09,819] [WARN ] [regionserver20020] [org.apache.hadoop.hbase.regionserver.HRegionServer 1483] Unable to connect to master. Retrying. Error was: java.net.SocketTimeoutException: Call to DDB03/192.168.28.53:20000 failed on socket timeout exception: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.28.53:59520 remote=DDB03/192.168.28.53:20000] {noformat} 2.through the jstack log of master,I found that one handle is waitting and others is blocked(waitForMeta). {noformat} 。。。。。。。。。。。。 "IPC Server handler 90 on 20000" daemon prio=10 tid=0x00007f219c540000 nid=0x4c3f in Object.wait() [0x00007f21963a7000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:485) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:757) 。。。。。。。。。 "IPC Server handler 87 on 20000" daemon prio=10 tid=0x00007f219c53a000 nid=0x4c37 waiting for monitor entry [0x00007f21966aa000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:397) - waiting to lock <0x0000000612486960> (a java.util.concurrent.atomic.AtomicBoolean) at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMetaServerConnectionDefault(CatalogTracker.java:437) 。。。。。。。。。。。 {noformat} 3.I also ensure that the waitting handle cause the others blocked,the waitting handle is waitting for the call to complete. 4.But the unable to create new native thread” happened, the IOException can not caught it. {noformat} protected synchronized void setupIOstreams() throws IOException { 。。。。 start(); } catch (IOException e) { markClosed(e); close(); throw e; } 。。。。。 {noformat} 5.thus the call will be lost in call queue and never to complete. {noformat} public Writable call(......) { ...... synchronized (call) { while (!call.done) { try { call.wait(); // wait for the result } catch (InterruptedException ignored) { // save the fact that we were interrupted interrupted = true; } } ...... } {noformat} > The OOM problem of IPC client call cause all handle block > ---------------------------------------------------------- > > Key: HBASE-5673 > URL: https://issues.apache.org/jira/browse/HBASE-5673 > Project: HBase > Issue Type: Bug > Affects Versions: 0.90.6 > Environment: 0.90.6 > Reporter: xufeng > Assignee: xufeng > > if HBaseClient meet "unable to create new native thread" exception, the call > will never complete because it be lost in calls queue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira