[
https://issues.apache.org/jira/browse/HBASE-8716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13678866#comment-13678866
]
stack commented on HBASE-8716:
------------------------------
I tried these changes on cluster and seems to do right thing. Here is before
the change:
{code}
[stack@sss-1 ~]$ ./hbase/bin/graceful_stop.sh --config /home/stack/conf-hbase x
2013-06-08T14:22:02 Disabling load balancer
2013-06-08T14:22:09 Previous balancer state was false
2013-06-08T14:22:09 Unloading x region(s)
2013-06-08 14:22:14,867 TRACE [main] zookeeper.ZKConfig: Skipped reading ZK
properties file 'zoo.cfg' since 'hbase.config.read.zookeeper.config' was not
set to true
2013-06-08 14:22:14,907 INFO [main] zookeeper.ZooKeeper: Client
environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT
2013-06-08 14:22:14,907 INFO [main] zookeeper.ZooKeeper: Client
environment:host.name=sss-1.ent.cloudera.com
2013-06-08 14:22:14,907 INFO [main] zookeeper.ZooKeeper: Client
environment:java.version=1.6.0_31
2013-06-08 14:22:14,907 INFO [main] zookeeper.ZooKeeper: Client
environment:java.vendor=Sun Microsystems Inc.
2013-06-08 14:22:14,907 INFO [main] zookeeper.ZooKeeper: Client
environment:java.home=/usr/java/jdk1.6.0_31/jre
....
2013-06-08 14:22:14,990 INFO [main-SendThread(sss-1.ent.cloudera.com:2181)]
zookeeper.ClientCnxn: Socket connection established to
sss-1.ent.cloudera.com/10.20.195.21:2181, initiating session
2013-06-08 14:22:15,049 INFO [main-SendThread(sss-1.ent.cloudera.com:2181)]
zookeeper.ClientCnxn: Session establishment complete on server
sss-1.ent.cloudera.com/10.20.195.21:2181, sessionid = 0x13ef746f91a0054,
negotiated timeout = 90000
RuntimeError: Server x not online
stripServer at /home/stack/hbase/bin/region_mover.rb:200
unloadRegions at /home/stack/hbase/bin/region_mover.rb:306
(root) at /home/stack/hbase/bin/region_mover.rb:456
2013-06-08T14:22:16 Unloaded x region(s)
2013-06-08T14:22:16 Stopping regionserver
x: ssh: Could not resolve hostname x: Name or service not known
[stack@sss-1 ~]$ echo $?
0
{code}
Here is after the change passing -e:
{code}
[stack@sss-1 ~]$ ./hbase/bin/graceful_stop.sh --config /home/stack/conf-hbase
-e x
2013-06-08T14:24:10 Disabling load balancer
2013-06-08T14:24:17 Previous balancer state was false
2013-06-08T14:24:17 Unloading x region(s)
2013-06-08 14:24:22,883 TRACE [main] zookeeper.ZKConfig: Skipped reading ZK
properties file 'zoo.cfg' since 'hbase.config.read.zookeeper.config' was not
set to true
2013-06-08 14:24:22,920 INFO [main] zookeeper.ZooKeeper: Client
environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT
2013-06-08 14:24:22,920 INFO [main] zookeeper.ZooKeeper: Client
environment:host.name=sss-1.ent.cloudera.com
2013-06-08 14:24:22,920 INFO [main] zookeeper.ZooKeeper: Client
environment:java.version=1.6.0_31
...
2013-06-08 14:24:22,949 INFO [main] zookeeper.RecoverableZooKeeper: Process
identifier=hconnection-0x24eff2c connecting to ZooKeeper
ensemble=sss-1.ent.cloudera.com:2181
2013-06-08 14:24:22,964 INFO [main-SendThread(sss-1.ent.cloudera.com:2181)]
zookeeper.ClientCnxn: Opening socket connection to server
sss-1.ent.cloudera.com/10.20.195.21:2181. Will not attempt to authenticate
using SASL (Unable to locate a login configuration)
2013-06-08 14:24:22,974 INFO [main-SendThread(sss-1.ent.cloudera.com:2181)]
zookeeper.ClientCnxn: Socket connection established to
sss-1.ent.cloudera.com/10.20.195.21:2181, initiating session
2013-06-08 14:24:23,020 INFO [main-SendThread(sss-1.ent.cloudera.com:2181)]
zookeeper.ClientCnxn: Session establishment complete on server
sss-1.ent.cloudera.com/10.20.195.21:2181, sessionid = 0x13ef746f91a0057,
negotiated timeout = 90000
RuntimeError: Server x not online
stripServer at /home/stack/hbase/bin/region_mover.rb:200
unloadRegions at /home/stack/hbase/bin/region_mover.rb:306
(root) at /home/stack/hbase/bin/region_mover.rb:456
[stack@sss-1 ~]$ echo $?
1
{code}
> Fixups/Improvements for graceful_stop.sh/region_mover.rb
> --------------------------------------------------------
>
> Key: HBASE-8716
> URL: https://issues.apache.org/jira/browse/HBASE-8716
> Project: HBase
> Issue Type: Improvement
> Reporter: stack
> Assignee: stack
> Attachments: 8716.txt
>
>
> It is a while since these scripts were touched. Giving them a spring
> cleaning and seeing if can make them return error codes on failure (seems
> like style previous was that the operator would watch the output and react to
> it but I see cases where tools want to call these scripts and they want
> return code to indicate whether the rolling upgrade worked or not). Also,
> see if can make the rolling restart faster since one-by-one while minimally
> disruptive and 'safe', it is slow one clusters of hundreds of nodes.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira