[ 
https://issues.apache.org/jira/browse/HBASE-8519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13656590#comment-13656590
 ] 

Jerry He commented on HBASE-8519:
---------------------------------

bq. You mean that we may not stop the master when we should?
Yes.

Much clearer comments than what I had!

Also some basic testing I did to cover the cases we described:

1.  Primary master running.  Backup master waiting.  stop-hbase.sh works 
successfully.
2.  Kills primary master in normal operation. Backup master becomes active 
successfully.
3.  Kills primary master during its initialization. Backup master becomes 
primary master successfully.
4.  Run stop-hbase.sh to stop hbase, but kills master before it finishes 
shutdown.  Re-start hbase is successful afterwards.

                
> Backup master will never come up if primary master dies during initialization
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-8519
>                 URL: https://issues.apache.org/jira/browse/HBASE-8519
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.94.7, 0.95.0
>            Reporter: Jerry He
>            Assignee: Jerry He
>            Priority: Minor
>             Fix For: 0.98.0
>
>         Attachments: HBASE-8519-trunk.patch
>
>
> The problem happens if primary master dies after becoming master but before 
> it completes initialization and calls clusterStatusTracker.setClusterUp(),
> The backup master will try to become the master, but will shutdown itself 
> promptly because it sees 'the cluster is not up'.
> This is the backup master log:
> 2013-05-09 15:08:05,568 INFO 
> org.apache.hadoop.hbase.master.metrics.MasterMetrics: Initialized
> 2013-05-09 15:08:05,573 DEBUG org.apache.hadoop.hbase.master.HMaster: HMaster 
> started in backup mode.  Stalling until master znode is written.
> 2013-05-09 15:08:05,589 INFO 
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node /hbase/master 
> already exists and this is not a retry
> 2013-05-09 15:08:05,590 INFO 
> org.apache.hadoop.hbase.master.ActiveMasterManager: Adding ZNode for 
> /hbase/backup-masters/xxx.com,60000,1368137285373 in backup master directory
> 2013-05-09 15:08:05,595 INFO 
> org.apache.hadoop.hbase.master.ActiveMasterManager: Another master is the 
> active master, xxx.com,60000,1368137283107; waiting to become the next active 
> master
> 2013-05-09 15:09:45,006 DEBUG 
> org.apache.hadoop.hbase.master.ActiveMasterManager: No master available. 
> Notifying waiting threads
> 2013-05-09 15:09:45,006 INFO org.apache.hadoop.hbase.master.HMaster: Cluster 
> went down before this master became active
> 2013-05-09 15:09:45,006 DEBUG org.apache.hadoop.hbase.master.HMaster: 
> Stopping service threads
> 2013-05-09 15:09:45,006 INFO org.apache.hadoop.ipc.HBaseServer: Stopping 
> server on 60000
>  
> In ActiveMasterManager::blockUntilBecomingActiveMaster()
> {code}
>   ..
>   if (!clusterStatusTracker.isClusterUp()) {
>           this.master.stop(
>             "Cluster went down before this master became active");
>         }
>   ..
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to