[
https://issues.apache.org/jira/browse/HBASE-27169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Duo Zhang resolved HBASE-27169.
-------------------------------
Hadoop Flags: Reviewed
Resolution: Fixed
It is much more stable than before but still have some failure runs.
https://ci-hbase.apache.org/job/HBase-Flaky-Tests/job/master/3818/testReport/junit/org.apache.hadoop.hbase.client/TestSeparateClientZKCluster/testMetaMoveDuringClientZkClusterRestart/
Will open other issues for fixing.
> TestSeparateClientZKCluster is flaky
> ------------------------------------
>
> Key: HBASE-27169
> URL: https://issues.apache.org/jira/browse/HBASE-27169
> Project: HBase
> Issue Type: Bug
> Components: test
> Reporter: Duo Zhang
> Assignee: Duo Zhang
> Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-4, 2.4.14
>
>
> https://nightlies.apache.org/hbase/HBase-Flaky-Tests/master/3773/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.client.TestSeparateClientZKCluster-output.txt
> {noformat}
> org.apache.hadoop.hbase.exceptions.MasterStoppedException: null
> at
> org.apache.hadoop.hbase.master.HMaster.checkInitialized(HMaster.java:3177)
> ~[classes/:?]
> at org.apache.hadoop.hbase.master.HMaster.balance(HMaster.java:1954)
> ~[classes/:?]
> at
> org.apache.hadoop.hbase.master.MasterRpcServices.balance(MasterRpcServices.java:743)
> ~[classes/:?]
> at
> org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
> ~[hbase-protocol-shaded-3.0.0-alpha-4-SNAPSHOT.jar:3.0.0-alpha-4-SNAPSHOT]
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:385)
> ~[classes/:?]
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
> ~[classes/:?]
> at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:104)
> ~[classes/:?]
> at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:84)
> ~[classes/:?]
> {noformat}
> I think the problem is that, MasterStoppedException is a sub class of
> DoNotRetryIOException, so when hitting this issue, we will fail immediately.
> And the client zk syncer is asynchoronous, so it is possible that when we
> call admin.balance, we haven't synced the new location yet, and it will throw
> the MasterStoppedException out soon and fail the UT.
> Let me see how to fix it.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)