Liu Shaohui created HBASE-8675:
----------------------------------

             Summary: Two active Hmaster for AUTH_FAILED in secure hbase cluster
                 Key: HBASE-8675
                 URL: https://issues.apache.org/jira/browse/HBASE-8675
             Project: HBase
          Issue Type: Bug
          Components: master
            Reporter: Liu Shaohui
            Priority: Critical


In our product cluster, because of the net problem to kerberos server, the 
ZooKeeperWatcher in active hmaster fails to Auth , gets a connection Event of 
AUTH_FAILED  and loose the master lock. But the zookeeper watcher ignores the 
event, so the old active hmaster keeps to be active. After the net problem is 
fixed, the backup hmaster gets the master lock and becomes active. There are 
two two active hmasters in the cluster.

2013-05-30 09:44:21,004 ERROR org.apache.zookeeper.client.ZooKeeperSaslClient: 
An error: (java.security.PrivilegedActionException: 
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: 
No valid credentials provided (Mechanism level: krb1.xiaomi.net)]) occurred 
when evaluating Zookeeper Quorum Member's  received SASL token. Zookeeper 
Client will go to AUTH_FAILED state.

2013-05-30 09:54:07,755 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil: 
hconnection-0x3e10d98be405bc Unable to set watcher on znode /hbase/master
org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = 
AuthFailed for /hbase/master
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:123)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1036)
        at 
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:166)
        at 
org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:231)
        at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:76)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.ensureZookeeperTrackers(HConnectionManager.java:595)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:850)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:825)
        at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:286)
        at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:201)
        at 
org.apache.hadoop.hbase.catalog.MetaReader.getHTable(MetaReader.java:200)
        at 
org.apache.hadoop.hbase.catalog.MetaReader.getMetaHTable(MetaReader.java:226)
        at 
org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:705)
        at 
org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:183)
        at 
org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:168)
        at 
org.apache.hadoop.hbase.master.CatalogJanitor.getSplitParents(CatalogJanitor.java:123)
        at 
org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:134)
        at 
org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:92)
        at org.apache.hadoop.hbase.Chore.run(Chore.java:67)
        at java.lang.Thread.run(Thread.java:662)


I want to just abort the hmaster server if AuthFailed or SaslAuthenticated. Any 
better idea about this issue? 
For ZookeeperWatcher is used in many classes, will the aborting will bring more 
problems? Any more problems we need consider? 


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to