[ 
https://issues.apache.org/jira/browse/HBASE-19838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16335083#comment-16335083
 ] 

stack commented on HBASE-19838:
-------------------------------

Closing the shared clusterconnection on Master#shutdown seems to do the trick. 
This is how it looks w/ the nice test added here:

 
{code:java}

2018-01-22 14:41:07,171 INFO [M:0;localhost:52959] 
regionserver.HRegionServer(1152): M:0;localhost:52959 exiting
Exception in thread "M:0;localhost:52959" java.lang.IllegalStateException: 
Expected the service ClusterSchemaServiceImpl [FAILED] to be TERMINATED, but 
the service has FAILED
at 
org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:345)
at 
org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitTerminated(AbstractService.java:318)
at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:576)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hbase.DoNotRetryIOException: hconnection-0x7d85155 
closed
at 
org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:722)
at 
org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131)
at 
org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:714)
at 
org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131)
at 
org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:684)
at 
org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131)
at 
org.apache.hadoop.hbase.client.ConnectionImplementation.getRegionLocation(ConnectionImplementation.java:562)
at 
org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.getRegionLocation(ConnectionUtils.java:131)
at 
org.apache.hadoop.hbase.client.HRegionLocator.getRegionLocation(HRegionLocator.java:73)
at 
org.apache.hadoop.hbase.client.RegionServerCallable.prepare(RegionServerCallable.java:223)
at 
org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:105)
at org.apache.hadoop.hbase.client.HTable.get(HTable.java:388)
at org.apache.hadoop.hbase.client.HTable.get(HTable.java:362)
at 
org.apache.hadoop.hbase.MetaTableAccessor.getTableState(MetaTableAccessor.java:1117)
at 
org.apache.hadoop.hbase.MetaTableAccessor.tableExists(MetaTableAccessor.java:427)
at 
org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:93)
at 
org.apache.hadoop.hbase.master.ClusterSchemaServiceImpl.doStart(ClusterSchemaServiceImpl.java:62)
at 
org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.startAsync(AbstractService.java:226)
at 
org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1062)
at 
org.apache.hadoop.hbase.master.TestShutdownBackupMaster$MockHMaster.initClusterSchemaService(TestShutdownBackupMaster.java:67)
at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:924)
at 
org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2026)
at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:557)
... 1 more{code}
 

Studying use of clusterConnection in regionserver and master, calling close to 
kill any ongoing RPCs seems to be what we want.

Study of shutdown in 'normal case' doesn't seem to change and runs 'normally'.

> Can not shutdown backup master cleanly when it has already tried to become 
> the active master
> --------------------------------------------------------------------------------------------
>
>                 Key: HBASE-19838
>                 URL: https://issues.apache.org/jira/browse/HBASE-19838
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>            Reporter: Duo Zhang
>            Assignee: Duo Zhang
>            Priority: Critical
>             Fix For: 2.0.0-beta-2
>
>         Attachments: HBASE-19838-UT.patch, HBASE-19838.master.001.patch
>
>
> This is the root cause that why TestZooKeeper hangs.
> Open a new issue to introduce a UT which can reproduce the problem stably so 
> that we can fix the TestZooKeeper since it is not designed to test this 
> problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to