[ 
https://issues.apache.org/jira/browse/HBASE-8519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13657496#comment-13657496
 ] 

Jerry He commented on HBASE-8519:
---------------------------------

While doing testing on master and backup master, I found out there are some 
issues with stop-hbase.sh:
{code}
# distributed == false means that the HMaster will kill ZK when it exits
# HBASE-6504 - only take the first line of the output in case verbose gc is on
distMode=`$bin/hbase --config "$HBASE_CONF_DIR" 
org.apache.hadoop.hbase.util.HBaseConfTool hbase.cluster.distributed | head -n 
1`
if [ "$distMode" == 'true' ]
then
  # TODO: store backup masters in ZooKeeper and have the primary send them a 
shutdown message
  # stop any backup masters
  "$bin"/hbase-daemons.sh --config "${HBASE_CONF_DIR}" \
    --hosts "${HBASE_BACKUP_MASTERS}" stop master-backup

  "$bin"/hbase-daemons.sh --config "${HBASE_CONF_DIR}" stop zookeeper
fi
{code}

1. Do we still need the TODO part to stop the backup master?
2. If the primary master died and backup master became active, running 
stop-hbase.sh will not work anymore. 
   With the above TODO part, the backup master is killed by force. We can not 
even go to backup master and issue the stop-hbase.sh.
3. Should we have a 'else' before the last killing zookeeper statement? It 
should be executed when 'distributed == false'? 

Should I open another JIRA for this, or add to this current JIRA?


                
> Backup master will never come up if primary master dies during initialization
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-8519
>                 URL: https://issues.apache.org/jira/browse/HBASE-8519
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.94.7, 0.95.0
>            Reporter: Jerry He
>            Assignee: Jerry He
>            Priority: Minor
>             Fix For: 0.98.0
>
>         Attachments: HBASE-8519-trunk.patch, HBASE-8519-trunk-v2.patch
>
>
> The problem happens if primary master dies after becoming master but before 
> it completes initialization and calls clusterStatusTracker.setClusterUp(),
> The backup master will try to become the master, but will shutdown itself 
> promptly because it sees 'the cluster is not up'.
> This is the backup master log:
> 2013-05-09 15:08:05,568 INFO 
> org.apache.hadoop.hbase.master.metrics.MasterMetrics: Initialized
> 2013-05-09 15:08:05,573 DEBUG org.apache.hadoop.hbase.master.HMaster: HMaster 
> started in backup mode.  Stalling until master znode is written.
> 2013-05-09 15:08:05,589 INFO 
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node /hbase/master 
> already exists and this is not a retry
> 2013-05-09 15:08:05,590 INFO 
> org.apache.hadoop.hbase.master.ActiveMasterManager: Adding ZNode for 
> /hbase/backup-masters/xxx.com,60000,1368137285373 in backup master directory
> 2013-05-09 15:08:05,595 INFO 
> org.apache.hadoop.hbase.master.ActiveMasterManager: Another master is the 
> active master, xxx.com,60000,1368137283107; waiting to become the next active 
> master
> 2013-05-09 15:09:45,006 DEBUG 
> org.apache.hadoop.hbase.master.ActiveMasterManager: No master available. 
> Notifying waiting threads
> 2013-05-09 15:09:45,006 INFO org.apache.hadoop.hbase.master.HMaster: Cluster 
> went down before this master became active
> 2013-05-09 15:09:45,006 DEBUG org.apache.hadoop.hbase.master.HMaster: 
> Stopping service threads
> 2013-05-09 15:09:45,006 INFO org.apache.hadoop.ipc.HBaseServer: Stopping 
> server on 60000
>  
> In ActiveMasterManager::blockUntilBecomingActiveMaster()
> {code}
>   ..
>   if (!clusterStatusTracker.isClusterUp()) {
>           this.master.stop(
>             "Cluster went down before this master became active");
>         }
>   ..
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to