rolling-restart.sh shouldn't rely on zoo.cfg
--------------------------------------------

                 Key: HBASE-2998
                 URL: https://issues.apache.org/jira/browse/HBASE-2998
             Project: HBase
          Issue Type: Bug
            Reporter: Jean-Daniel Cryans
             Fix For: 0.90.0


I tried the rolling-restart script on our dev environment, which is configured 
with zoo.cfg for zookeeper, and it worked pretty well. Then I tried it on our 
MR cluster, which doesn't have a zoo.cfg, and we suffered some downtime (no 
biggie tho, nothing critical was running). When the script calls this line:

{code}
bin/hbase zkcli stat $zmaster
{code}

It directly runs a ZooKeeperMain which isn't modified to read from the HBase 
configuration files. What happens next if ZK isn't running on the master node 
is that it receives a ConnectionRefused, ignores it, procedes to restart the 
master (which waits on the znode), and the starts restarting the region 
servers. They can't shutdown properly under 60 seconds, since they need a 
master, so they get killed. What follows is pretty ugly and pretty much 
requires a whole restart.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to