[ 
https://issues.apache.org/jira/browse/HBASE-18613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser resolved HBASE-18613.
--------------------------------
    Resolution: Invalid

Jumped the gun on this one. That warning wasn't actually fatal. The test had 
failed due to some exceptions in the action a bit earlier in the log.

> Race condition between master restart and test code when restoring 
> distributed cluster after integration test
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-18613
>                 URL: https://issues.apache.org/jira/browse/HBASE-18613
>             Project: HBase
>          Issue Type: Bug
>          Components: integration tests
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>            Priority: Minor
>             Fix For: 2.0.0, 1.4.0, 1.3.2, 1.2.7, 1.1.13
>
>
> Noticed the following in some internal testing (line numbers likely are 
> skewed)
> {noformat}
> 2017-08-16 21:20:25,557| 2017-08-16 21:20:25,553 WARN  [main] 
> client.ConnectionManager$HConnectionImplementation: Checking master connection
> 2017-08-16 21:20:25,557| com.google.protobuf.ServiceException: 
> org.apache.hadoop.hbase.exceptions.ConnectionClosingException: Call to 
> master1.domain.com/10.0.2.131:16000 failed on local exception: 
> org.apache.hadoop.hbase.exceptions.ConnectionClosingException: Connection to 
> master1.domain.com/10.0.2.131:16000 is closing. Call id=581, waitTime=1
> 2017-08-16 21:20:25,557| at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:223)
> 2017-08-16 21:20:25,558| at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:287)
> 2017-08-16 21:20:25,560| at 
> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:62739)
> 2017-08-16 21:20:25,560| at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$MasterServiceState.isMasterRunning(ConnectionManager.java:1448)
> 2017-08-16 21:20:25,561| at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.isKeepAliveMasterConnectedAndRunning(ConnectionManag
> er.java:2124)
> 2017-08-16 21:20:25,561| at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getKeepAliveMasterService(ConnectionManager.java:1712)
> 2017-08-16 21:20:25,562| at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getMaster(ConnectionManager.java:1701)
> 2017-08-16 21:20:25,562| at 
> org.apache.hadoop.hbase.DistributedHBaseCluster.getMasterAdminService(DistributedHBaseCluster.java:153)
> 2017-08-16 21:20:25,563| at 
> org.apache.hadoop.hbase.DistributedHBaseCluster.waitForActiveAndReadyMaster(DistributedHBaseCluster.java:184)
> 2017-08-16 21:20:25,563| at 
> org.apache.hadoop.hbase.HBaseCluster.waitForActiveAndReadyMaster(HBaseCluster.java:204)
> 2017-08-16 21:20:25,563| at 
> org.apache.hadoop.hbase.DistributedHBaseCluster.restoreMasters(DistributedHBaseCluster.java:278)
> 2017-08-16 21:20:25,563| at 
> org.apache.hadoop.hbase.DistributedHBaseCluster.restoreClusterStatus(DistributedHBaseCluster.java:239)
> 2017-08-16 21:20:25,563| at 
> org.apache.hadoop.hbase.HBaseCluster.restoreInitialStatus(HBaseCluster.java:235)
> 2017-08-16 21:20:25,564| at 
> org.apache.hadoop.hbase.IntegrationTestingUtility.restoreCluster(IntegrationTestingUtility.java:99)
> 2017-08-16 21:20:25,564| at 
> org.apache.hadoop.hbase.IntegrationTestBase.cleanUpCluster(IntegrationTestBase.java:200)
> 2017-08-16 21:20:25,564| at 
> org.apache.hadoop.hbase.IntegrationTestDDLMasterFailover.cleanUpCluster(IntegrationTestDDLMasterFailover.java:146)
> 2017-08-16 21:20:25,564| at 
> org.apache.hadoop.hbase.IntegrationTestBase.cleanUp(IntegrationTestBase.java:140)
> 2017-08-16 21:20:25,564| at 
> org.apache.hadoop.hbase.IntegrationTestBase.doWork(IntegrationTestBase.java:125)
> 2017-08-16 21:20:25,565| at 
> org.apache.hadoop.hbase.util.AbstractHBaseTool.run(AbstractHBaseTool.java:112)
> 2017-08-16 21:20:25,565| at 
> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
> 2017-08-16 21:20:25,565| at 
> org.apache.hadoop.hbase.IntegrationTestDDLMasterFailover.main(IntegrationTestDDLMasterFailover.java:832)
> 2017-08-16 21:20:25,566| Caused by: 
> org.apache.hadoop.hbase.exceptions.ConnectionClosingException: Call to 
> master1.domain.com/10.0.2.131:16000 failed on local exception: 
> org.apache.hadoop.hbase.exceptions.ConnectionClosingException: Connection to 
> master1.domain.com/10.0.2.131:16000 is closing. Call id=581, waitTime=1
> 2017-08-16 21:20:25,566| at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl.wrapException(RpcClientImpl.java:1258)
> 2017-08-16 21:20:25,566| at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1229)
> 2017-08-16 21:20:25,566| at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:213)
> 2017-08-16 21:20:25,566| ... 20 more
> 2017-08-16 21:20:25,566| Caused by: 
> org.apache.hadoop.hbase.exceptions.ConnectionClosingException: Connection to 
> master1.domain.com/10.0.2.131:16000 is closing. Call id=581, waitTime=1
> 2017-08-16 21:20:25,567| at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.cleanupCalls(RpcClientImpl.java:1047)
> 2017-08-16 21:20:25,567| at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.close(RpcClientImpl.java:846)
> 2017-08-16 21:20:25,567| at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.run(RpcClientImpl.java:574)
> {noformat}
> This is when the IntegrationTest harness is resetting the state of the 
> distributed cluster. When dealing with "slow" nodes, the restart of the 
> previously active master could be delayed which cause the test code to see a 
> ConnectionClosingException (wrapped in a ServiceException).
> I think we want to just consume this Exception, same as 
> MasterNotRunningException and ZooKeeperConnectionException, in 
> {{DistributedHBaseCluster#waitForActiveAndReadyMaster(long)}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to