rolling-restart.sh shouldn't rely on zoo.cfg
--------------------------------------------
Key: HBASE-2998
URL: https://issues.apache.org/jira/browse/HBASE-2998
Project: HBase
Issue Type: Bug
Reporter: Jean-Daniel Cryans
Fix For: 0.90.0
I tried the rolling-restart script on our dev environment, which is configured
with zoo.cfg for zookeeper, and it worked pretty well. Then I tried it on our
MR cluster, which doesn't have a zoo.cfg, and we suffered some downtime (no
biggie tho, nothing critical was running). When the script calls this line:
{code}
bin/hbase zkcli stat $zmaster
{code}
It directly runs a ZooKeeperMain which isn't modified to read from the HBase
configuration files. What happens next if ZK isn't running on the master node
is that it receives a ConnectionRefused, ignores it, procedes to restart the
master (which waits on the znode), and the starts restarting the region
servers. They can't shutdown properly under 60 seconds, since they need a
master, so they get killed. What follows is pretty ugly and pretty much
requires a whole restart.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.