[ 
https://issues.apache.org/jira/browse/HBASE-9939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13818561#comment-13818561
 ] 

Lars Hofhansl commented on HBASE-9939:
--------------------------------------

Although one would expect the callable itself to timeout and throw an exception.
Since you unplugged the network you're probably mostly seeing the ZK session's 
attempts to be reestablished.
I've seen this with our clients; we observed that if the entire cluster is down 
it takes (with the 0.94 defaults) up to 20 mins before after the various 
retries the client eventually times out. Some of this was improved by avoiding 
nested retry loops (see HBASE-6326), but it still takes a long time with the 
default.

In our systems we use different ZK timeouts and retry count in the server 
(where this is used for server to server communication) and in the client 
(where we prefer fast timeouts so that we do not tie up our AppServer threads).

This looks a bit different though:
{code}
"hbase-tablepool-7-thread-4" id=43 idx=0xc0 tid=22572 prio=5 alive, waiting, 
native_blocked, daemon
    -- Waiting for notification on: 
org/apache/hadoop/hbase/ipc/HBaseClient$Call@0x00000000058F1F38[fat lock]
    at jrockit/vm/Threads.waitForNotifySignal(JLjava/lang/Object;)Z(Native 
Method)
    at java/lang/Object.wait(J)V(Native Method)
    at java/lang/Object.wait(Object.java:485)
    at org/apache/hadoop/hbase/ipc/HBaseClient.call(HBaseClient.java:981)
    ^-- Lock released while waiting: 
org/apache/hadoop/hbase/ipc/HBaseClient$Call@0x00000000058F1F38[fat lock]
    at 
org/apache/hadoop/hbase/ipc/SecureRpcEngine$Invoker.invoke(SecureRpcEngine.java:104)
    at 
$Proxy7.multi(Lorg/apache/hadoop/hbase/client/MultiAction;)Lorg/apache/hadoop/hbase/client/MultiResponse;(Unknown
 Source)
    at 
org/apache/hadoop/hbase/client/HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1398)
    at 
org/apache/hadoop/hbase/client/HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1396)
    at 
org/apache/hadoop/hbase/client/ServerCallable.withoutRetries(ServerCallable.java:210)
    at 
org/apache/hadoop/hbase/client/HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1405)
{code}

So we need to look into this.

> All HBase client threads are locked out on network failure
> ----------------------------------------------------------
>
>                 Key: HBASE-9939
>                 URL: https://issues.apache.org/jira/browse/HBASE-9939
>             Project: HBase
>          Issue Type: Bug
>          Components: Client
>    Affects Versions: 0.94.6
>            Reporter: Rohit Joshi
>             Fix For: 0.94.14
>
>
> Under load when I disabled network interface, all HBase threads were locked 
> out.  I was expecting these threads to be released based on 
> client.operation.timeout and rpc,timeout.
> Here is a link for  thread dump.
> https://www.dropbox.com/s/y1ng3yoywq09x2u/HBaseClient_Threaddump.txt



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to