[jira] [Commented] (HBASE-8519) Backup master will never come up if primary master dies during initialization

Jean-Daniel Cryans (JIRA) Mon, 13 May 2013 15:57:18 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-8519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13656492#comment-13656492
 ]


Jean-Daniel Cryans commented on HBASE-8519:
-------------------------------------------

bq. Checking the current cluster status is just to be safe because getting the 
notification is a past event. Thoughts?

We're really talking about a few ms window, FWIW you could run isClusterUp() 
and then have it go down the millisecond later. So I think it's reasonably not 
too far in the past.

bq. What do you think we should use for the name?

I don't any good idea, so better comments won't hurt.

Here's what I would say in nodeDeleted instead of before the if statement:

{code}
// We need to keep track of the cluster's shutdown status while
// we wait on the current master. We consider that, if the cluster
// was already in a "shutdown" state when we started, that this master
// is part of a new cluster that was started shortly after the old cluster
// shut down, so that state is now irrelevant. This means that the shutdown
// state must be set while we wait on the active master in order
// to shutdown this master. See HBASE-8519.
{code}

Makes sense? [~stack], can you tell me if this comment makes sense to you?

bq. We are indeed more restrictive on stopping backup master? Any thoughts if 
we are too restrictive?

You mean that we may not stop the master when we should? I guess if you start a 
master at the same time that the cluster is going down then it will continue 
running, but to me this seems less bad than a master that shuts down when we 
need it to start a cluster.
                
> Backup master will never come up if primary master dies during initialization
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-8519
>                 URL: https://issues.apache.org/jira/browse/HBASE-8519
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.94.7, 0.95.0
>            Reporter: Jerry He
>            Assignee: Jerry He
>            Priority: Minor
>             Fix For: 0.98.0
>
>         Attachments: HBASE-8519-trunk.patch
>
>
> The problem happens if primary master dies after becoming master but before 
> it completes initialization and calls clusterStatusTracker.setClusterUp(),
> The backup master will try to become the master, but will shutdown itself 
> promptly because it sees 'the cluster is not up'.
> This is the backup master log:
> 2013-05-09 15:08:05,568 INFO 
> org.apache.hadoop.hbase.master.metrics.MasterMetrics: Initialized
> 2013-05-09 15:08:05,573 DEBUG org.apache.hadoop.hbase.master.HMaster: HMaster 
> started in backup mode.  Stalling until master znode is written.
> 2013-05-09 15:08:05,589 INFO 
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node /hbase/master 
> already exists and this is not a retry
> 2013-05-09 15:08:05,590 INFO 
> org.apache.hadoop.hbase.master.ActiveMasterManager: Adding ZNode for 
> /hbase/backup-masters/xxx.com,60000,1368137285373 in backup master directory
> 2013-05-09 15:08:05,595 INFO 
> org.apache.hadoop.hbase.master.ActiveMasterManager: Another master is the 
> active master, xxx.com,60000,1368137283107; waiting to become the next active 
> master
> 2013-05-09 15:09:45,006 DEBUG 
> org.apache.hadoop.hbase.master.ActiveMasterManager: No master available. 
> Notifying waiting threads
> 2013-05-09 15:09:45,006 INFO org.apache.hadoop.hbase.master.HMaster: Cluster 
> went down before this master became active
> 2013-05-09 15:09:45,006 DEBUG org.apache.hadoop.hbase.master.HMaster: 
> Stopping service threads
> 2013-05-09 15:09:45,006 INFO org.apache.hadoop.ipc.HBaseServer: Stopping 
> server on 60000
>  
> In ActiveMasterManager::blockUntilBecomingActiveMaster()
> {code}
>   ..
>   if (!clusterStatusTracker.isClusterUp()) {
>           this.master.stop(
>             "Cluster went down before this master became active");
>         }
>   ..
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8519) Backup master will never come up if primary master dies during initialization

Reply via email to