[
https://issues.apache.org/jira/browse/HBASE-19838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16335083#comment-16335083
]
stack commented on HBASE-19838:
-------------------------------
Closing the shared clusterconnection on Master#shutdown seems to do the trick.
This is how it looks w/ the nice test added here:
{code:java}
2018-01-22 14:41:07,171 INFO [M:0;localhost:52959]
regionserver.HRegionServer(1152): M:0;localhost:52959 exiting
Exception in thread "M:0;localhost:52959" java.lang.IllegalStateException:
Expected the service ClusterSchemaServiceImpl [FAILED] to be TERMINATED, but
the service has FAILED
at
org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:345)
at
org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitTerminated(AbstractService.java:318)
at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:576)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hbase.DoNotRetryIOException: hconnection-0x7d85155
closed
at
org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:722)
at
org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131)
at
org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:714)
at
org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131)
at
org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:684)
at
org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131)
at
org.apache.hadoop.hbase.client.ConnectionImplementation.getRegionLocation(ConnectionImplementation.java:562)
at
org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.getRegionLocation(ConnectionUtils.java:131)
at
org.apache.hadoop.hbase.client.HRegionLocator.getRegionLocation(HRegionLocator.java:73)
at
org.apache.hadoop.hbase.client.RegionServerCallable.prepare(RegionServerCallable.java:223)
at
org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:105)
at org.apache.hadoop.hbase.client.HTable.get(HTable.java:388)
at org.apache.hadoop.hbase.client.HTable.get(HTable.java:362)
at
org.apache.hadoop.hbase.MetaTableAccessor.getTableState(MetaTableAccessor.java:1117)
at
org.apache.hadoop.hbase.MetaTableAccessor.tableExists(MetaTableAccessor.java:427)
at
org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:93)
at
org.apache.hadoop.hbase.master.ClusterSchemaServiceImpl.doStart(ClusterSchemaServiceImpl.java:62)
at
org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.startAsync(AbstractService.java:226)
at
org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1062)
at
org.apache.hadoop.hbase.master.TestShutdownBackupMaster$MockHMaster.initClusterSchemaService(TestShutdownBackupMaster.java:67)
at
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:924)
at
org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2026)
at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:557)
... 1 more{code}
Studying use of clusterConnection in regionserver and master, calling close to
kill any ongoing RPCs seems to be what we want.
Study of shutdown in 'normal case' doesn't seem to change and runs 'normally'.
> Can not shutdown backup master cleanly when it has already tried to become
> the active master
> --------------------------------------------------------------------------------------------
>
> Key: HBASE-19838
> URL: https://issues.apache.org/jira/browse/HBASE-19838
> Project: HBase
> Issue Type: Bug
> Components: master
> Reporter: Duo Zhang
> Assignee: Duo Zhang
> Priority: Critical
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19838-UT.patch, HBASE-19838.master.001.patch
>
>
> This is the root cause that why TestZooKeeper hangs.
> Open a new issue to introduce a UT which can reproduce the problem stably so
> that we can fix the TestZooKeeper since it is not designed to test this
> problem.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)