[
https://issues.apache.org/jira/browse/HBASE-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923372#action_12923372
]
HBase Review Board commented on HBASE-2998:
-------------------------------------------
Message from: [email protected]
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1057/
-----------------------------------------------------------
(Updated 2010-10-21 01:54:34.658192)
Review request for hbase, Jean-Daniel Cryans and Jonathan Gray.
Changes
-------
New patch includes faster assign of regions on startup (Uses async
create/exists-set-watcher). Getting this working helps w/ rolling restart
tests. Assign and watcher set for 2k regions runs fast now... used to be 90
seconds for 2k regions over 10 servers ... now its a matter of seconds for
total bulk assign of all regions in just over a minute.
This patch is not yet ready. I need to test more.
Summary
-------
Fix 'hbase zkcli' so it reads zk ensemble location from hbase config/zoo.cfg.
This fixes rolling restart. Patch also includes fix so rolling restarts work
on new master.
A
src/test/java/org/apache/hadoop/hbase/zookeeper/TestZooKeeperMainServerArg.java
Test for new TZMSA class.
M src/main/java/org/apache/hadoop/hbase/zookeeper/ZKServerTool.java
Minor edit of javadoc.
A src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperMainServerArg.java
Tool to emit what ZooKeeperMain wants for a server argument.
M src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
(isAbort): Added.
M src/main/java/org/apache/hadoop/hbase/regionserver/ShutdownHook.java
Shutdown hook now needs to startup region shutdowns since new
master changed how shutdown sequence runs.
M
src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
Don't do opens if server is stopped.
M src/main/java/org/apache/hadoop/hbase/master/handler/OpenedRegionHandler.java
Minor formatting.
M bin/hbase
Run new ZKMSA tool to figure '-server host:port' to pass ZKM
M bin/hbase-daemon.sh
Make default wait be longer.
This addresses bug hbase-2998.
http://issues.apache.org/jira/browse/hbase-2998
Diffs (updated)
-----
trunk/bin/hbase 1025815
trunk/bin/hbase-daemon.sh 1025815
trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
1025815
trunk/src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java 1025815
trunk/src/main/java/org/apache/hadoop/hbase/executor/ExecutorService.java
1025815
trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
1025815
trunk/src/main/java/org/apache/hadoop/hbase/master/handler/OpenedRegionHandler.java
1025815
trunk/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
1025815
trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java 1025815
trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKServerTool.java
1025815
trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 1025815
trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java
1025815
Diff: http://review.cloudera.org/r/1057/diff
Testing
-------
Thanks,
stack
> rolling-restart.sh shouldn't rely on zoo.cfg
> --------------------------------------------
>
> Key: HBASE-2998
> URL: https://issues.apache.org/jira/browse/HBASE-2998
> Project: HBase
> Issue Type: Bug
> Reporter: Jean-Daniel Cryans
> Assignee: stack
> Priority: Critical
> Fix For: 0.90.0
>
> Attachments: 2998.txt
>
>
> I tried the rolling-restart script on our dev environment, which is
> configured with zoo.cfg for zookeeper, and it worked pretty well. Then I
> tried it on our MR cluster, which doesn't have a zoo.cfg, and we suffered
> some downtime (no biggie tho, nothing critical was running). When the script
> calls this line:
> {code}
> bin/hbase zkcli stat $zmaster
> {code}
> It directly runs a ZooKeeperMain which isn't modified to read from the HBase
> configuration files. What happens next if ZK isn't running on the master node
> is that it receives a ConnectionRefused, ignores it, procedes to restart the
> master (which waits on the znode), and the starts restarting the region
> servers. They can't shutdown properly under 60 seconds, since they need a
> master, so they get killed. What follows is pretty ugly and pretty much
> requires a whole restart.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.