[ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13506801#comment-13506801 ]
Jean-Daniel Cryans commented on HBASE-5926: ------------------------------------------- This jira has the odd side-effect of printing out a lot of garbage when running in standalone and killing it with -9, gist of it being: {noformat} 2012-11-29 13:08:27,227 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master 2012-11-29 13:08:27,227 ERROR org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: ZooKeeper getData failed after 0 retries 2012-11-29 13:08:27,227 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil: clean znode for master Unable to get data of znode /hbase/master org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1131) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:291) at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataNoWatch(ZKUtil.java:562) at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.deleteIfEquals(MasterAddressTracker.java:168) at org.apache.hadoop.hbase.ZNodeClearer.clear(ZNodeClearer.java:150) at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:110) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:78) at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2298) {noformat} Basically the znode cleaner fails hard because ZK is offline. I was confused to see more logs being printed out after running the kill. > Delete the master znode after a master crash > -------------------------------------------- > > Key: HBASE-5926 > URL: https://issues.apache.org/jira/browse/HBASE-5926 > Project: HBase > Issue Type: Improvement > Components: master, scripts > Affects Versions: 0.96.0 > Reporter: nkeywal > Assignee: nkeywal > Priority: Minor > Fix For: 0.96.0 > > Attachments: 5926.v10.patch, 5926.v11.patch, 5926.v13.patch, > 5926.v14.patch, 5926.v6.patch, 5926.v8.patch, 5926.v9.patch > > > This is the continuation of the work done in HBASE-5844. > But we can't apply exactly the same strategy: for the region server, there is > a znode per region server, while for the master & backup master there is a > single znode for both. > So if we apply the same strategy as for a regionserver, we may have this > scenario: > 1) Master starts > 2) Backup master starts > 3) Master dies > 4) ZK detects it > 5) Backup master receives the update from ZK > 6) Backup master creates the new master node and become the main master > 7) Previous master script continues > 8) Previous master script deletes the master node in ZK > 9) => issue: we deleted the node just created by the new master > This should not happen often (usually the znode will be deleted soon enough), > but it can happen. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira