[
https://issues.apache.org/jira/browse/HBASE-8974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13721953#comment-13721953
]
Jean-Marc Spaggiari commented on HBASE-8974:
--------------------------------------------
I tailed the RS logs over a restart and there is only one restart displayed:
{code}
dimanche 28 juillet 2013, 09:17:02 (UTC-0400) Terminating regionserver
2013-07-28 09:17:02,208 INFO org.apache.hadoop.ipc.HBaseServer: Stopping server
on 60020
2013-07-28 09:17:02,208 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC
Server listener on 60020
2013-07-28 09:17:02,208 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server
handler 5 on 60020: exiting
2013-07-28 09:17:02,209 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC
Server Responder
2013-07-28 09:17:02,209 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC
Server Responder
2013-07-28 09:17:02,209 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server
handler 2 on 60020: exiting
2013-07-28 09:17:02,209 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server
handler 0 on 60020: exiting
2013-07-28 09:17:02,209 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 1 on 60020: exiting
2013-07-28 09:17:02,209 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server
handler 9 on 60020: exiting
2013-07-28 09:17:02,209 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 9 on 60020: exiting
2013-07-28 09:17:02,209 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server
handler 6 on 60020: exiting
2013-07-28 09:17:02,209 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 4 on 60020: exiting
2013-07-28 09:17:02,209 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 0 on 60020: exiting
2013-07-28 09:17:02,209 INFO org.apache.hadoop.ipc.HBaseServer: REPL IPC Server
handler 2 on 60020: exiting
2013-07-28 09:17:02,208 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server
handler 3 on 60020: exiting
2013-07-28 09:17:02,208 INFO org.apache.hadoop.ipc.HBaseServer: REPL IPC Server
handler 0 on 60020: exiting
2013-07-28 09:17:02,208 INFO org.apache.hadoop.ipc.HBaseServer: REPL IPC Server
handler 1 on 60020: exiting
2013-07-28 09:17:02,209 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 2 on 60020: exiting
2013-07-28 09:17:02,209 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 8 on 60020: exiting
2013-07-28 09:17:02,209 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server
handler 1 on 60020: exiting
2013-07-28 09:17:02,209 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server
handler 7 on 60020: exiting
2013-07-28 09:17:02,209 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 6 on 60020: exiting
2013-07-28 09:17:02,209 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server
handler 4 on 60020: exiting
2013-07-28 09:17:02,209 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 3 on 60020: exiting
2013-07-28 09:17:02,209 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 7 on 60020: exiting
2013-07-28 09:17:02,209 INFO org.mortbay.log: Stopped
[email protected]:60030
2013-07-28 09:17:02,209 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server
handler 8 on 60020: exiting
2013-07-28 09:17:02,209 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 5 on 60020: exiting
2013-07-28 09:17:02,312 INFO
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
Closed zookeeper sessionid=0x3400251e47305dc
dimanche 28 juillet 2013, 09:17:03 (UTC-0400) Starting regionserver on node3
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 93921
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 32768
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 93921
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
2013-07-28 09:17:03,676 INFO org.apache.hadoop.hbase.util.VersionInfo: HBase
0.94.10
2013-07-28 09:17:03,676 INFO org.apache.hadoop.hbase.util.VersionInfo:
Subversion https://svn.apache.org/repos/asf/hbase/tags/0.94.10RC0 -r 1504995
2013-07-28 09:17:03,676 INFO org.apache.hadoop.hbase.util.VersionInfo: Compiled
by jenkins on Fri Jul 19 20:24:16 UTC 2013
2013-07-28 09:17:03,778 INFO org.apache.hadoop.hbase.util.ServerCommandLine:
vmName=Java HotSpot(TM) 64-Bit Server VM, vmVendor=Oracle Corporation,
vmVersion=23.1-b03
2013-07-28 09:17:03,778 INFO org.apache.hadoop.hbase.util.ServerCommandLine:
vmInputArguments=[-XX:OnOutOfMemoryError=kill -9 %p, -Xmx6196m,
-XX:+UseConcMarkSweepGC, -XX:+UseConcMarkSweepGC,
-Dhbase.log.dir=/home/hbase/hbase-0.94.3/bin/../logs,
-Dhbase.log.file=hbase-hbase-regionserver-node3.log,
-Dhbase.home.dir=/home/hbase/hbase-0.94.3/bin/.., -Dhbase.id.str=hbase,
-Dhbase.root.logger=INFO,DRFA,
-Djava.library.path=/home/hbase/hbase-0.94.3/bin/../lib/native/Linux-amd64-64,
-Dhbase.security.logger=INFO,DRFAS]
2013-07-28 09:17:03,998 INFO org.apache.hadoop.ipc.HBaseServer: Starting
Thread-0
2013-07-28 09:17:03,998 INFO org.apache.hadoop.ipc.HBaseServer: Starting
Thread-0
2013-07-28 09:17:03,999 INFO org.apache.hadoop.ipc.HBaseServer: Starting
Thread-0
2013-07-28 09:17:03,999 INFO org.apache.hadoop.ipc.HBaseServer: Starting
Thread-0
2013-07-28 09:17:04,000 INFO org.apache.hadoop.ipc.HBaseServer: Starting
Thread-0
2013-07-28 09:17:04,000 INFO org.apache.hadoop.ipc.HBaseServer: Starting
Thread-0
2013-07-28 09:17:04,001 INFO org.apache.hadoop.ipc.HBaseServer: Starting
Thread-0
2013-07-28 09:17:04,002 INFO org.apache.hadoop.ipc.HBaseServer: Starting
Thread-0
2013-07-28 09:17:04,002 INFO org.apache.hadoop.ipc.HBaseServer: Starting
Thread-0
2013-07-28 09:17:04,002 INFO org.apache.hadoop.ipc.HBaseServer: Starting
Thread-0
2013-07-28 09:17:04,009 INFO org.apache.hadoop.hbase.ipc.HBaseRpcMetrics:
Initializing RPC Metrics with hostName=HRegionServer, port=60020
2013-07-28 09:17:04,106 INFO org.apache.hadoop.hbase.io.hfile.CacheConfig:
Allocating LruBlockCache with maximum size 2,4g
2013-07-28 09:17:04,316 INFO org.apache.hadoop.hbase.util.FSUtils: FileSystem
doesn't support getDefaultBlockSize
2013-07-28 09:17:04,329 INFO org.apache.hadoop.hbase.util.FSUtils: FileSystem
doesn't support getDefaultReplication
2013-07-28 09:17:04,339 INFO org.apache.hadoop.hbase.util.FSUtils: FileSystem
doesn't support getDefaultReplication
2013-07-28 09:17:04,339 INFO org.apache.hadoop.hbase.util.FSUtils: FileSystem
doesn't support getDefaultBlockSize
2013-07-28 09:17:04,393 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=RegionServer,
sessionId=regionserver60020
2013-07-28 09:17:04,413 INFO org.apache.hadoop.hbase.metrics: MetricsString
added: revision
2013-07-28 09:17:04,413 INFO org.apache.hadoop.hbase.metrics: MetricsString
added: hdfsUser
2013-07-28 09:17:04,413 INFO org.apache.hadoop.hbase.metrics: MetricsString
added: hdfsDate
2013-07-28 09:17:04,413 INFO org.apache.hadoop.hbase.metrics: MetricsString
added: hdfsUrl
2013-07-28 09:17:04,413 INFO org.apache.hadoop.hbase.metrics: MetricsString
added: date
2013-07-28 09:17:04,413 INFO org.apache.hadoop.hbase.metrics: MetricsString
added: hdfsRevision
2013-07-28 09:17:04,413 INFO org.apache.hadoop.hbase.metrics: MetricsString
added: user
2013-07-28 09:17:04,413 INFO org.apache.hadoop.hbase.metrics: MetricsString
added: hdfsVersion
2013-07-28 09:17:04,413 INFO org.apache.hadoop.hbase.metrics: MetricsString
added: url
2013-07-28 09:17:04,413 INFO org.apache.hadoop.hbase.metrics: MetricsString
added: version
2013-07-28 09:17:04,413 INFO org.apache.hadoop.hbase.metrics: new MBeanInfo
2013-07-28 09:17:04,414 INFO org.apache.hadoop.hbase.metrics: new MBeanInfo
2013-07-28 09:17:04,444 INFO org.mortbay.log: Logging to
org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2013-07-28 09:17:04,476 INFO org.apache.hadoop.http.HttpServer: Added global
filtersafety (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
2013-07-28 09:17:04,480 INFO org.apache.hadoop.http.HttpServer: Port returned
by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the
listener on 60030
2013-07-28 09:17:04,480 INFO org.apache.hadoop.http.HttpServer:
listener.getLocalPort() returned 60030
webServer.getConnectors()[0].getLocalPort() returned 60030
2013-07-28 09:17:04,480 INFO org.apache.hadoop.http.HttpServer: Jetty bound to
port 60030
2013-07-28 09:17:04,480 INFO org.mortbay.log: jetty-6.1.26
2013-07-28 09:17:04,750 INFO org.mortbay.log: Started
[email protected]:60030
2013-07-28 09:17:04,751 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
Responder: starting
2013-07-28 09:17:04,754 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
listener on 60020: starting
2013-07-28 09:17:04,767 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 0 on 60020: starting
2013-07-28 09:17:04,768 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 1 on 60020: starting
2013-07-28 09:17:04,768 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 2 on 60020: starting
2013-07-28 09:17:04,768 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 3 on 60020: starting
2013-07-28 09:17:04,768 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 4 on 60020: starting
2013-07-28 09:17:04,768 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 5 on 60020: starting
2013-07-28 09:17:04,768 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 6 on 60020: starting
2013-07-28 09:17:04,768 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 7 on 60020: starting
2013-07-28 09:17:04,768 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 8 on 60020: starting
2013-07-28 09:17:04,768 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 9 on 60020: starting
2013-07-28 09:17:04,769 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server
handler 0 on 60020: starting
2013-07-28 09:17:04,769 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server
handler 1 on 60020: starting
2013-07-28 09:17:04,769 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server
handler 2 on 60020: starting
2013-07-28 09:17:04,769 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server
handler 3 on 60020: starting
2013-07-28 09:17:04,769 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server
handler 4 on 60020: starting
2013-07-28 09:17:04,770 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server
handler 5 on 60020: starting
2013-07-28 09:17:04,770 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server
handler 6 on 60020: starting
2013-07-28 09:17:04,770 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server
handler 7 on 60020: starting
2013-07-28 09:17:04,770 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server
handler 8 on 60020: starting
2013-07-28 09:17:04,770 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server
handler 9 on 60020: starting
2013-07-28 09:17:04,770 INFO org.apache.hadoop.ipc.HBaseServer: REPL IPC Server
handler 0 on 60020: starting
2013-07-28 09:17:04,775 INFO org.apache.hadoop.ipc.HBaseServer: REPL IPC Server
handler 1 on 60020: starting
2013-07-28 09:17:04,775 INFO org.apache.hadoop.ipc.HBaseServer: REPL IPC Server
handler 2 on 60020: starting
2013-07-28 09:17:07,197 ERROR
org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics: Inconsistent
configuration. Previous configuration for using table name in metrics: true,
new configuration: false
2013-07-28 09:17:07,202 INFO org.apache.hadoop.hbase.util.ChecksumType:
Checksum can use java.util.zip.CRC32
2013-07-28 09:17:28,700 WARN org.apache.hadoop.io.compress.snappy.LoadSnappy:
Snappy native library is available
2013-07-28 09:17:28,701 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded
the native-hadoop library
2013-07-28 09:17:28,701 INFO org.apache.hadoop.io.compress.snappy.LoadSnappy:
Snappy native library loaded
2013-07-28 09:17:28,702 INFO org.apache.hadoop.io.compress.CodecPool: Got
brand-new compressor
2013-07-28 09:17:28,715 INFO org.apache.hadoop.io.compress.CodecPool: Got
brand-new decompressor
2013-07-28 09:17:31,776 INFO org.apache.hadoop.io.compress.CodecPool: Got
brand-new decompressor
{code}
That's all what I got over the entire rolling-restart. So from the RS side,
seems that it's not restarted more than one.
[~ndimiduk] can you take a look at your RS logs too to see if it matches what
you are seeing?
> bin/rolling-restart.sh restarts all active RS's with each iteration instead
> of one at a time
> --------------------------------------------------------------------------------------------
>
> Key: HBASE-8974
> URL: https://issues.apache.org/jira/browse/HBASE-8974
> Project: HBase
> Issue Type: Bug
> Components: scripts
> Reporter: Nick Dimiduk
>
> I'm exercising the patch over on HBASE-8803 and I've noticed something in the
> logs: it looks like {{rolling-restart.sh}} is restarting all the region
> servers multiple times instead of just the current entry in the loop
> iteration.
> The logic looks like this:
> {noformat}
> for each rs in active region server list:
> unload $rs // move all regions to other RS's
> restart all Region Servers // !?! bug?
> reload $rs // pile 'em back on
> {noformat}
> Shouldn't that step 2 be only {{restart $rs}}?
> This is what I see in the logs. My cluster has 9 active RegionServers. Notice
> the bit in the middle where all 9 are stopped and started again after
> unloading the target RS.
> {noformat}
> $ time /usr/lib/hbase/bin/rolling-restart.sh --rs-only --graceful
> --maxthreads 30
>
> Gracefully restarting: hor18n39.gq1.ygridcore.net
> Disabling balancer!
> ...
> Unloading hor18n39.gq1.ygridcore.net region(s)
> ...
> Valid region move targets:
> hor18n37.gq1.ygridcore.net,60020,1374094975268
> hor17n37.gq1.ygridcore.net,60020,1374094975264
> hor18n35.gq1.ygridcore.net,60020,1374094975327
> hor17n39.gq1.ygridcore.net,60020,1374094975281
> hor18n36.gq1.ygridcore.net,60020,1374094975254
> hor17n36.gq1.ygridcore.net,60020,1374094975277
> hor17n34.gq1.ygridcore.net,60020,1374094975291
> hor18n38.gq1.ygridcore.net,60020,1374094975259
> 13/07/17 21:44:38 INFO region_mover: Moving 330 region(s) from
> hor18n39.gq1.ygridcore.net,60020,1374094975326 during this cycle
> 13/07/17 21:44:38 INFO region_mover: Moving region
> b59050cf97aabcef838e3c50e93e6d13 (1 of 330) to
> server=hor18n37.gq1.ygridcore.net,60020,1374094975268
> ...
> 13/07/17 21:54:20 INFO region_mover: Moving region
> d00026d7cc396bb3e6ea91106cc6ab55 (329 of 330) to
> server=hor18n37.gq1.ygridcore.net,60020,1374094975268
> 13/07/17 21:54:20 INFO region_mover: Moving region
> a722179b33e6ece8c9cee3fba3056acd (330 of 330) to
> server=hor17n37.gq1.ygridcore.net,60020,1374094975264
> 13/07/17 21:54:21 INFO region_mover: Wrote list of moved regions to
> /tmp/hor18n39.gq1.ygridcore.net
> Unloaded hor18n39.gq1.ygridcore.net region(s)
> hor18n35.gq1.ygridcore.net: stopping regionserver.
> hor17n39.gq1.ygridcore.net: stopping regionserver.
> hor18n36.gq1.ygridcore.net: stopping regionserver.
> hor17n37.gq1.ygridcore.net: stopping regionserver.
> hor17n34.gq1.ygridcore.net: stopping regionserver.
> hor18n38.gq1.ygridcore.net: stopping regionserver.
> hor18n37.gq1.ygridcore.net: stopping regionserver.
> hor17n36.gq1.ygridcore.net: stopping regionserver.
> hor18n39.gq1.ygridcore.net: stopping regionserver.
> hor18n36.gq1.ygridcore.net: starting regionserver, logging to
> /grid/0/var/log/hbase/hbase-hbase-regionserver-hor18n36.gq1.ygridcore.net.out
> hor17n36.gq1.ygridcore.net: starting regionserver, logging to
> /grid/0/var/log/hbase/hbase-hbase-regionserver-hor17n36.gq1.ygridcore.net.out
> hor17n37.gq1.ygridcore.net: starting regionserver, logging to
> /grid/0/var/log/hbase/hbase-hbase-regionserver-hor17n37.gq1.ygridcore.net.out
> hor18n37.gq1.ygridcore.net: starting regionserver, logging to
> /grid/0/var/log/hbase/hbase-hbase-regionserver-hor18n37.gq1.ygridcore.net.out
> hor18n38.gq1.ygridcore.net: starting regionserver, logging to
> /grid/0/var/log/hbase/hbase-hbase-regionserver-hor18n38.gq1.ygridcore.net.out
> hor17n34.gq1.ygridcore.net: starting regionserver, logging to
> /grid/0/var/log/hbase/hbase-hbase-regionserver-hor17n34.gq1.ygridcore.net.out
> hor18n35.gq1.ygridcore.net: starting regionserver, logging to
> /grid/0/var/log/hbase/hbase-hbase-regionserver-hor18n35.gq1.ygridcore.net.out
> hor18n39.gq1.ygridcore.net: starting regionserver, logging to
> /grid/0/var/log/hbase/hbase-hbase-regionserver-hor18n39.gq1.ygridcore.net.out
> hor17n39.gq1.ygridcore.net: starting regionserver, logging to
> /grid/0/var/log/hbase/hbase-hbase-regionserver-hor17n39.gq1.ygridcore.net.out
> Reloading hor18n39.gq1.ygridcore.net region(s)
> ...
> 13/07/17 21:54:27 INFO region_mover: Moving 330 regions to
> hor18n39.gq1.ygridcore.net,60020,1374098064602
> 13/07/17 21:56:47 INFO region_mover: Moving region
> 7d0a02f452c334a12026b45346a87d36 (1 of 330) to
> server=hor18n39.gq1.ygridcore.net,60020,1374098064602 in thread 0
> 13/07/17 21:56:54 INFO region_mover: Moving region
> af5448c90e78a8f0d935efb0b380502e (2 of 330) to
> server=hor18n39.gq1.ygridcore.net,60020,1374098064602 in thread 1
> ...
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira