stack created HBASE-14284:
-----------------------------

             Summary: In TRUNK, AsyncRpcClient does not timeout; hangs 
TestDistributedLogReplay, etc.
                 Key: HBASE-14284
                 URL: https://issues.apache.org/jira/browse/HBASE-14284
             Project: HBase
          Issue Type: Bug
            Reporter: stack
            Assignee: stack


TestDistributedLogReplay puts up regionservers with *40* priority handlers 
each. This makes for TDLR running with many hundreds of threads. Trying to 
figure why 40, I see the test can hang if less with all client use stuck never 
timing out:

{code}
"RS:2;localhost:58498" prio=5 tid=0x00007fd284d4e800 nid=0x416af in 
Object.wait() [0x000000012952e000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:461)
        at 
io.netty.util.concurrent.DefaultPromise.await0(DefaultPromise.java:355)
        - locked <0x00000007dff93ea0> (a org.apache.hadoop.hbase.ipc.AsyncCall)
        at 
io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:266)
        at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:42)
        at 
org.apache.hadoop.hbase.ipc.AsyncRpcClient.call(AsyncRpcClient.java:231)
        at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:214)
        at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:288)
        at 
org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.regionServerReport(RegionServerStatusProtos.java:8994)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:1148)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:957)
        at 
org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:156)
        at 
org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:108)
        at 
org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:140)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:356)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
        at 
org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:279)
        at 
org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:138)
        at java.lang.Thread.run(Thread.java:744)

{code}

We  never recover.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to