[
https://issues.apache.org/jira/browse/HBASE-18613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Josh Elser resolved HBASE-18613.
--------------------------------
Resolution: Invalid
Jumped the gun on this one. That warning wasn't actually fatal. The test had
failed due to some exceptions in the action a bit earlier in the log.
> Race condition between master restart and test code when restoring
> distributed cluster after integration test
> -------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-18613
> URL: https://issues.apache.org/jira/browse/HBASE-18613
> Project: HBase
> Issue Type: Bug
> Components: integration tests
> Reporter: Josh Elser
> Assignee: Josh Elser
> Priority: Minor
> Fix For: 2.0.0, 1.4.0, 1.3.2, 1.2.7, 1.1.13
>
>
> Noticed the following in some internal testing (line numbers likely are
> skewed)
> {noformat}
> 2017-08-16 21:20:25,557| 2017-08-16 21:20:25,553 WARN [main]
> client.ConnectionManager$HConnectionImplementation: Checking master connection
> 2017-08-16 21:20:25,557| com.google.protobuf.ServiceException:
> org.apache.hadoop.hbase.exceptions.ConnectionClosingException: Call to
> master1.domain.com/10.0.2.131:16000 failed on local exception:
> org.apache.hadoop.hbase.exceptions.ConnectionClosingException: Connection to
> master1.domain.com/10.0.2.131:16000 is closing. Call id=581, waitTime=1
> 2017-08-16 21:20:25,557| at
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:223)
> 2017-08-16 21:20:25,558| at
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:287)
> 2017-08-16 21:20:25,560| at
> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:62739)
> 2017-08-16 21:20:25,560| at
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$MasterServiceState.isMasterRunning(ConnectionManager.java:1448)
> 2017-08-16 21:20:25,561| at
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.isKeepAliveMasterConnectedAndRunning(ConnectionManag
> er.java:2124)
> 2017-08-16 21:20:25,561| at
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getKeepAliveMasterService(ConnectionManager.java:1712)
> 2017-08-16 21:20:25,562| at
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getMaster(ConnectionManager.java:1701)
> 2017-08-16 21:20:25,562| at
> org.apache.hadoop.hbase.DistributedHBaseCluster.getMasterAdminService(DistributedHBaseCluster.java:153)
> 2017-08-16 21:20:25,563| at
> org.apache.hadoop.hbase.DistributedHBaseCluster.waitForActiveAndReadyMaster(DistributedHBaseCluster.java:184)
> 2017-08-16 21:20:25,563| at
> org.apache.hadoop.hbase.HBaseCluster.waitForActiveAndReadyMaster(HBaseCluster.java:204)
> 2017-08-16 21:20:25,563| at
> org.apache.hadoop.hbase.DistributedHBaseCluster.restoreMasters(DistributedHBaseCluster.java:278)
> 2017-08-16 21:20:25,563| at
> org.apache.hadoop.hbase.DistributedHBaseCluster.restoreClusterStatus(DistributedHBaseCluster.java:239)
> 2017-08-16 21:20:25,563| at
> org.apache.hadoop.hbase.HBaseCluster.restoreInitialStatus(HBaseCluster.java:235)
> 2017-08-16 21:20:25,564| at
> org.apache.hadoop.hbase.IntegrationTestingUtility.restoreCluster(IntegrationTestingUtility.java:99)
> 2017-08-16 21:20:25,564| at
> org.apache.hadoop.hbase.IntegrationTestBase.cleanUpCluster(IntegrationTestBase.java:200)
> 2017-08-16 21:20:25,564| at
> org.apache.hadoop.hbase.IntegrationTestDDLMasterFailover.cleanUpCluster(IntegrationTestDDLMasterFailover.java:146)
> 2017-08-16 21:20:25,564| at
> org.apache.hadoop.hbase.IntegrationTestBase.cleanUp(IntegrationTestBase.java:140)
> 2017-08-16 21:20:25,564| at
> org.apache.hadoop.hbase.IntegrationTestBase.doWork(IntegrationTestBase.java:125)
> 2017-08-16 21:20:25,565| at
> org.apache.hadoop.hbase.util.AbstractHBaseTool.run(AbstractHBaseTool.java:112)
> 2017-08-16 21:20:25,565| at
> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
> 2017-08-16 21:20:25,565| at
> org.apache.hadoop.hbase.IntegrationTestDDLMasterFailover.main(IntegrationTestDDLMasterFailover.java:832)
> 2017-08-16 21:20:25,566| Caused by:
> org.apache.hadoop.hbase.exceptions.ConnectionClosingException: Call to
> master1.domain.com/10.0.2.131:16000 failed on local exception:
> org.apache.hadoop.hbase.exceptions.ConnectionClosingException: Connection to
> master1.domain.com/10.0.2.131:16000 is closing. Call id=581, waitTime=1
> 2017-08-16 21:20:25,566| at
> org.apache.hadoop.hbase.ipc.RpcClientImpl.wrapException(RpcClientImpl.java:1258)
> 2017-08-16 21:20:25,566| at
> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1229)
> 2017-08-16 21:20:25,566| at
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:213)
> 2017-08-16 21:20:25,566| ... 20 more
> 2017-08-16 21:20:25,566| Caused by:
> org.apache.hadoop.hbase.exceptions.ConnectionClosingException: Connection to
> master1.domain.com/10.0.2.131:16000 is closing. Call id=581, waitTime=1
> 2017-08-16 21:20:25,567| at
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.cleanupCalls(RpcClientImpl.java:1047)
> 2017-08-16 21:20:25,567| at
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.close(RpcClientImpl.java:846)
> 2017-08-16 21:20:25,567| at
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.run(RpcClientImpl.java:574)
> {noformat}
> This is when the IntegrationTest harness is resetting the state of the
> distributed cluster. When dealing with "slow" nodes, the restart of the
> previously active master could be delayed which cause the test code to see a
> ConnectionClosingException (wrapped in a ServiceException).
> I think we want to just consume this Exception, same as
> MasterNotRunningException and ZooKeeperConnectionException, in
> {{DistributedHBaseCluster#waitForActiveAndReadyMaster(long)}}.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)