[
https://issues.apache.org/jira/browse/HBASE-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923970#action_12923970
]
HBase Review Board commented on HBASE-2998:
-------------------------------------------
Message from: [email protected]
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1057/
-----------------------------------------------------------
(Updated 2010-10-22 11:59:16.222128)
Review request for hbase, Jean-Daniel Cryans and Jonathan Gray.
Changes
-------
Here is updated patch. I've been testing it up on cluster. It fixes a bunch of
things that rolling restart unearths...but there is still work to do.
Meantime, this patch is growing beyond scope of the JIRA so would like to get
it in in its current state.
Includes consideration of Jon's last review -- shutdown handler didn't need
amending afterall.
Includes refactor of master run because failure during processing of failover
was having the master exit but not go down -- rpc server was left up.
Summary
-------
Fix 'hbase zkcli' so it reads zk ensemble location from hbase config/zoo.cfg.
This fixes rolling restart. Patch also includes fix so rolling restarts work
on new master.
A
src/test/java/org/apache/hadoop/hbase/zookeeper/TestZooKeeperMainServerArg.java
Test for new TZMSA class.
M src/main/java/org/apache/hadoop/hbase/zookeeper/ZKServerTool.java
Minor edit of javadoc.
A src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperMainServerArg.java
Tool to emit what ZooKeeperMain wants for a server argument.
M src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
(isAbort): Added.
M src/main/java/org/apache/hadoop/hbase/regionserver/ShutdownHook.java
Shutdown hook now needs to startup region shutdowns since new
master changed how shutdown sequence runs.
M
src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
Don't do opens if server is stopped.
M src/main/java/org/apache/hadoop/hbase/master/handler/OpenedRegionHandler.java
Minor formatting.
M bin/hbase
Run new ZKMSA tool to figure '-server host:port' to pass ZKM
M bin/hbase-daemon.sh
Make default wait be longer.
This addresses bug hbase-2998.
http://issues.apache.org/jira/browse/hbase-2998
Diffs (updated)
-----
trunk/bin/hbase 1026448
trunk/bin/hbase-daemon.sh 1026448
trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
1026448
trunk/src/main/java/org/apache/hadoop/hbase/executor/ExecutorService.java
1026448
trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
1026448
trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1026448
trunk/src/main/java/org/apache/hadoop/hbase/master/handler/OpenedRegionHandler.java
1026448
trunk/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
1026448
trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java 1026448
trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKServerTool.java
1026448
trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 1026448
trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperMainServerArg.java
PRE-CREATION
trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java
1026448
trunk/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZooKeeperMainServerArg.java
PRE-CREATION
Diff: http://review.cloudera.org/r/1057/diff
Testing
-------
Thanks,
stack
> rolling-restart.sh shouldn't rely on zoo.cfg
> --------------------------------------------
>
> Key: HBASE-2998
> URL: https://issues.apache.org/jira/browse/HBASE-2998
> Project: HBase
> Issue Type: Bug
> Reporter: Jean-Daniel Cryans
> Assignee: stack
> Priority: Critical
> Fix For: 0.90.0
>
> Attachments: 2998.txt
>
>
> I tried the rolling-restart script on our dev environment, which is
> configured with zoo.cfg for zookeeper, and it worked pretty well. Then I
> tried it on our MR cluster, which doesn't have a zoo.cfg, and we suffered
> some downtime (no biggie tho, nothing critical was running). When the script
> calls this line:
> {code}
> bin/hbase zkcli stat $zmaster
> {code}
> It directly runs a ZooKeeperMain which isn't modified to read from the HBase
> configuration files. What happens next if ZK isn't running on the master node
> is that it receives a ConnectionRefused, ignores it, procedes to restart the
> master (which waits on the znode), and the starts restarting the region
> servers. They can't shutdown properly under 60 seconds, since they need a
> master, so they get killed. What follows is pretty ugly and pretty much
> requires a whole restart.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.