[ 
https://issues.apache.org/jira/browse/HBASE-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923970#action_12923970
 ] 

HBase Review Board commented on HBASE-2998:
-------------------------------------------

Message from: [email protected]

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1057/
-----------------------------------------------------------

(Updated 2010-10-22 11:59:16.222128)


Review request for hbase, Jean-Daniel Cryans and Jonathan Gray.


Changes
-------

Here is updated patch. I've been testing it up on cluster.  It fixes a bunch of 
things that rolling restart unearths...but there is still work to do.  
Meantime, this patch is growing beyond scope of the JIRA so would like to get 
it in in its current state.

Includes consideration of Jon's last review -- shutdown handler didn't need
amending afterall.

Includes refactor of master run because failure during processing of failover
was having the master exit but not go down -- rpc server was left up.


Summary
-------

Fix 'hbase zkcli' so it reads zk ensemble location from hbase config/zoo.cfg.  
This fixes rolling restart.  Patch also includes fix so rolling restarts work 
on new master.

A 
src/test/java/org/apache/hadoop/hbase/zookeeper/TestZooKeeperMainServerArg.java
  Test for new TZMSA class.
M src/main/java/org/apache/hadoop/hbase/zookeeper/ZKServerTool.java
  Minor edit of javadoc.
A src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperMainServerArg.java
  Tool to emit what ZooKeeperMain wants for a server argument.
M src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
  (isAbort): Added.
M src/main/java/org/apache/hadoop/hbase/regionserver/ShutdownHook.java
  Shutdown hook now needs to startup region shutdowns since  new
  master changed how shutdown sequence runs.
M 
src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
  Don't do opens if server is stopped.
M src/main/java/org/apache/hadoop/hbase/master/handler/OpenedRegionHandler.java
  Minor formatting.
M bin/hbase
  Run new ZKMSA tool to figure '-server host:port' to pass ZKM
M bin/hbase-daemon.sh
  Make default wait be longer.


This addresses bug hbase-2998.
    http://issues.apache.org/jira/browse/hbase-2998


Diffs (updated)
-----

  trunk/bin/hbase 1026448 
  trunk/bin/hbase-daemon.sh 1026448 
  trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java 
1026448 
  trunk/src/main/java/org/apache/hadoop/hbase/executor/ExecutorService.java 
1026448 
  trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 
1026448 
  trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1026448 
  
trunk/src/main/java/org/apache/hadoop/hbase/master/handler/OpenedRegionHandler.java
 1026448 
  
trunk/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
 1026448 
  trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java 1026448 
  trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKServerTool.java 
1026448 
  trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 1026448 
  
trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperMainServerArg.java
 PRE-CREATION 
  trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java 
1026448 
  
trunk/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZooKeeperMainServerArg.java
 PRE-CREATION 

Diff: http://review.cloudera.org/r/1057/diff


Testing
-------


Thanks,

stack




> rolling-restart.sh shouldn't rely on zoo.cfg
> --------------------------------------------
>
>                 Key: HBASE-2998
>                 URL: https://issues.apache.org/jira/browse/HBASE-2998
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Jean-Daniel Cryans
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.90.0
>
>         Attachments: 2998.txt
>
>
> I tried the rolling-restart script on our dev environment, which is 
> configured with zoo.cfg for zookeeper, and it worked pretty well. Then I 
> tried it on our MR cluster, which doesn't have a zoo.cfg, and we suffered 
> some downtime (no biggie tho, nothing critical was running). When the script 
> calls this line:
> {code}
> bin/hbase zkcli stat $zmaster
> {code}
> It directly runs a ZooKeeperMain which isn't modified to read from the HBase 
> configuration files. What happens next if ZK isn't running on the master node 
> is that it receives a ConnectionRefused, ignores it, procedes to restart the 
> master (which waits on the znode), and the starts restarting the region 
> servers. They can't shutdown properly under 60 seconds, since they need a 
> master, so they get killed. What follows is pretty ugly and pretty much 
> requires a whole restart.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to