It happens about 10% of the time for me. I'm going to try to grab a jstack next time I see it happen and try to note what is preventing the graceful shutdown.
On Fri, Jan 5, 2018 at 12:18 PM, Ted Yu <[email protected]> wrote: > Mike: > Do you still see the hang from region server ? > > I recently successfully shutdown a 5 node cluster running on top of hadoop > 3 (after completing LoadTestTool) based on this commit: > > HBASE-19667 Get rid of MasterEnvironment#supportGroupCPs > > FYI > > On Wed, Jan 3, 2018 at 3:23 PM, Sergey Soldatov <[email protected]> > wrote: > > > It may be something else that prevents HBase from the proper shutdown. > > Metric system is automatically stopped and started by JmxCacheBuster and > > may be irrelevant to your problem. > > > > Thanks, > > Sergey > > > > On Wed, Jan 3, 2018 at 1:37 PM, Mike Drob <[email protected]> wrote: > > > > > Hi folks, > > > > > > I've been seeing on one of my testbeds intermittently that HBase will > > fail > > > to stop, I suspect that this somehow involves the Hadoop Metrics system > > > having a thread that isn't getting cleaned up. This is on a fork of > > > branch-2 testing against hadoop-3 (although not with the RC > > specifically). > > > Here's a log excerpt from the shutdown sequence: > > > > > > > > > 2017-12-31 01:44:07,308 INFO > > > org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server > > > bwiskf-2,22101,1514712293690; zookeeper connection closed. > > > 2017-12-31 01:44:07,308 INFO > > > org.apache.hadoop.hbase.regionserver.HRegionServer: > > > regionserver/bwiskf-2/x.x.x.x:22101 exiting > > > 2017-12-31 01:44:07,309 INFO org.apache.zookeeper.ClientCnxn: > > EventThread > > > shut down > > > 2017-12-31 01:44:08,521 INFO org.apache.hadoop.hbase. > > regionserver.Leases: > > > regionserver/bwiskf-2/x.x.x.x:22101.leaseChecker closing leases > > > 2017-12-31 01:44:08,521 INFO org.apache.hadoop.hbase. > > regionserver.Leases: > > > regionserver/bwiskf-2/x.x.x.x:22101.leaseChecker closed leases > > > 2017-12-31 01:44:09,290 INFO > > > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping HBase > > metrics > > > system... > > > 2017-12-31 01:44:09,290 INFO > > > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: HBase metrics > system > > > stopped. > > > 2017-12-31 01:44:09,792 INFO org.apache.hadoop.metrics2. > > > impl.MetricsConfig: > > > loaded properties from hadoop-metrics2.properties > > > 2017-12-31 01:44:09,798 INFO > > > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled Metric > > > snapshot period at 10 second(s). > > > 2017-12-31 01:44:09,798 INFO > > > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: HBase metrics > system > > > started > > > 2017-12-31 01:45:21,905 INFO > > > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping HBase > > metrics > > > system... > > > 2017-12-31 01:45:21,906 INFO > > > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: HBase metrics > system > > > stopped. > > > 2017-12-31 01:45:22,408 INFO org.apache.hadoop.metrics2. > > > impl.MetricsConfig: > > > loaded properties from hadoop-metrics2.properties > > > 2017-12-31 01:45:22,409 INFO > > > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled Metric > > > snapshot period at 10 second(s). > > > 2017-12-31 01:45:22,409 INFO > > > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: HBase metrics > system > > > started > > > 2017-12-31 01:47:12,137 INFO > > > org.apache.hadoop.hbase.regionserver.ShutdownHook: Shutdown hook > > starting; > > > hbase.shutdown.hook=true; > > > fsShutdownHook=org.apache.hadoop.fs.FileSystem$Cache$ > > > ClientFinalizer@336880df > > > 2017-12-31 01:47:12,138 INFO > > > org.apache.hadoop.hbase.regionserver.ShutdownHook: Starting fs > shutdown > > > hook thread. > > > 2017-12-31 01:47:12,139 INFO > > > org.apache.hadoop.hbase.regionserver.ShutdownHook: Shutdown hook > > finished. > > > > > > The first shutdown was a graceful shutdown request that started around > > > 1:43, and then 1:47 was a shutdown via process kill request. The > metrics > > > system looks like it starts up again every time it is stopped. I'm > > digging > > > through the internals of HRegionServer trying to figure out where we > > > interface with it but not having much luck finding the cleanup steps. > > > > > > Mike > > > > > >
