[
https://issues.apache.org/jira/browse/HBASE-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908903#action_12908903
]
Patrick Hunt commented on HBASE-2966:
-------------------------------------
Hi Kannan, one of the reasons I asked about the log was to verify the scenario
I found. Since it's not available we can only speculate, however I feel pretty
confident this is the same issue.
Notice in the stack dump that there is no "SendThread" anywhere, although there
are a large number of event threads. This indicates that something was going on
perhaps session expirations given that the "event thread not shutting down" is
triggered by session expiration, ZOOKEEPER-795 detailed earlier in this jira.
Notice that "exists" is the hanging operation in the stack dump of this JIRA
(vs close in 846), however the issue in both cases (hang) is the same
underlying problem - both close and exists queue packets to be sent to the
server, they can hang if the queue is not cleaned up properly. One discrepancy
is that that "sendthread" should only shut down on client issued close (or zk
state closed, which doesn't trigger this bug). If there is no way that your
code is calling close then this bug should not be triggered, but w/o the logs
it's hard to me to speculate. Is it possible that close was called due to some
network issue? (error handling due to the network instability causing the
session epirations say).
> HBase client stuck on
> org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1278) holding
> regionLockObject lock
> -----------------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-2966
> URL: https://issues.apache.org/jira/browse/HBASE-2966
> Project: HBase
> Issue Type: Bug
> Reporter: Kannan Muthukkaruppan
> Attachments: stack.txt
>
>
> We noticed in one case the HBase client program got stuck on
> Zookeeper.exists() call.
>
> One of the threads was stuck here on the ZK call while holding an HBase level
> lock (regionLockObject in locateRegionInMeta()).
> {code}
> "thrift-0-thread-8" prio=10 tid=0x00007f189ca4c000 nid=0x550f in
> Object.wait() [0x0000000044241000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at java.lang.Object.wait(Object.java:485)
> at
> org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1278)
> - locked <0x00007f1903a0c280> (a
> org.apache.zookeeper.ClientCnxn$Packet)
> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:804)
> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:837)
> at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.getRSDirectoryCount(ZooKeeperWrapper.java:765)
> at
> org.apache.hadoop.hbase.client.HTable.getCurrentNrHRS(HTable.java:173)
> at
> org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:147)
> at
> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:124)
> at
> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:89)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.prefetchRegionCache(HConnectionManager.java:734)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:785)
> - locked <0x00007f190d868848> (a java.lang.Object)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:679)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:646)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocation(HConnectionManager.java:472)
> at
> org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1147)
> at org.apache.hadoop.hbase.client.HTable.get(HTable.java:503)
> {code}
> The remaining other threads are all waiting on the regionLockObject lock
> (held by the above thread) with stacks like:
>
> {code}
> thrift-0-thread-7" prio=10 tid=0x00007f189ca4a800 nid=0x550e waiting for
> monitor entry [0x0000000044141000]
> java.lang.Thread.State: BLOCKED (on object monitor)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:783)
> - waiting to lock <0x00007f190d868848> (a java.lang.Object)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:679)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:646)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocation(HConnectionManager.java:472)
> at
> org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1147)
> at org.apache.hadoop.hbase.client.HTable.get(HTable.java:503)
> {code}
> Any ideas?
>
> Meanwhile, I'll look into the ZK logs from the relevant time some more and
> get back if I have more information.
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.