[ 
https://issues.apache.org/jira/browse/HBASE-27169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang resolved HBASE-27169.
-------------------------------
    Hadoop Flags: Reviewed
      Resolution: Fixed

It is much more stable than before but still have some failure runs.

https://ci-hbase.apache.org/job/HBase-Flaky-Tests/job/master/3818/testReport/junit/org.apache.hadoop.hbase.client/TestSeparateClientZKCluster/testMetaMoveDuringClientZkClusterRestart/

Will open other issues for fixing.

> TestSeparateClientZKCluster is flaky
> ------------------------------------
>
>                 Key: HBASE-27169
>                 URL: https://issues.apache.org/jira/browse/HBASE-27169
>             Project: HBase
>          Issue Type: Bug
>          Components: test
>            Reporter: Duo Zhang
>            Assignee: Duo Zhang
>            Priority: Major
>             Fix For: 2.5.0, 3.0.0-alpha-4, 2.4.14
>
>
> https://nightlies.apache.org/hbase/HBase-Flaky-Tests/master/3773/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.client.TestSeparateClientZKCluster-output.txt
> {noformat}
> org.apache.hadoop.hbase.exceptions.MasterStoppedException: null
>       at 
> org.apache.hadoop.hbase.master.HMaster.checkInitialized(HMaster.java:3177) 
> ~[classes/:?]
>       at org.apache.hadoop.hbase.master.HMaster.balance(HMaster.java:1954) 
> ~[classes/:?]
>       at 
> org.apache.hadoop.hbase.master.MasterRpcServices.balance(MasterRpcServices.java:743)
>  ~[classes/:?]
>       at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
>  ~[hbase-protocol-shaded-3.0.0-alpha-4-SNAPSHOT.jar:3.0.0-alpha-4-SNAPSHOT]
>       at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:385) 
> ~[classes/:?]
>       at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) 
> ~[classes/:?]
>       at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:104) 
> ~[classes/:?]
>       at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:84) 
> ~[classes/:?]
> {noformat}
> I think the problem is that, MasterStoppedException is a sub class of 
> DoNotRetryIOException, so when hitting this issue, we will fail immediately.
> And the client zk syncer is asynchoronous, so it is possible that when we 
> call admin.balance, we haven't synced the new location yet, and it will throw 
> the MasterStoppedException out soon and fail the UT.
> Let me see how to fix it. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to