Recursive loop on KeeperException in 
AuthenticationTokenSecretManager/ZKLeaderManager
-------------------------------------------------------------------------------------

                 Key: HBASE-4857
                 URL: https://issues.apache.org/jira/browse/HBASE-4857
             Project: HBase
          Issue Type: Bug
          Components: security
    Affects Versions: 0.92.0, 0.94.0
            Reporter: Gary Helmling
             Fix For: 0.92.0


Looking through stack traces for {{TestMasterFailover}}, I see a case where the 
leader {{AuthenticationTokenSecretManager}} can get into a recursive loop when 
a {{KeeperException}} is encountered:
{noformat}
Thread-1-EventThread" daemon prio=10 tid=0x00007f9fb47b2800 nid=0x77f6 waiting 
on condition [0x00007f9fab376000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at java.lang.Thread.sleep(Thread.java:302)
        at java.util.concurrent.TimeUnit.sleep(TimeUnit.java:328)
        at 
org.apache.hadoop.hbase.util.RetryCounter.sleepUntilNextRetry(RetryCounter.java:55)
        at 
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:206)
        at 
org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndFailSilent(ZKUtil.java:891)
        at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.createBaseZNodes(ZooKeeperWatcher.java:161)
        at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.<init>(ZooKeeperWatcher.java:154)
        at 
org.apache.hadoop.hbase.master.HMaster.tryRecoveringExpiredZKSession(HMaster.java:1397)
        at org.apache.hadoop.hbase.master.HMaster.abortNow(HMaster.java:1435)
        at org.apache.hadoop.hbase.master.HMaster.abort(HMaster.java:1374)
        at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.abort(ZooKeeperWatcher.java:450)
        at 
org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:166)
        at 
org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293)
        at 
org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:167)
        at 
org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293)
        at 
org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:167)
        at 
org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293)
        at 
org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.handleLeaderChange(ZKLeaderManager.java:96)
        at 
org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.nodeDeleted(ZKLeaderManager.java:78)
        at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:286)
        at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:521)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497)
{noformat}

The {{KeeperException}} causes {{ZKLeaderManager}} to call 
{{AuthenticationTokenSecretManager$LeaderElector.stop()}}, which calls 
{{ZKLeaderManager.stepDownAsLeader()}}, which will encounter another 
{{KeeperException}}, and so on...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to