Duo Zhang created HBASE-27169:
---------------------------------
Summary: TestSeparateClientZKCluster is flaky
Key: HBASE-27169
URL: https://issues.apache.org/jira/browse/HBASE-27169
Project: HBase
Issue Type: Bug
Components: test
Reporter: Duo Zhang
https://nightlies.apache.org/hbase/HBase-Flaky-Tests/master/3773/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.client.TestSeparateClientZKCluster-output.txt
{noformat}
org.apache.hadoop.hbase.exceptions.MasterStoppedException: null
at
org.apache.hadoop.hbase.master.HMaster.checkInitialized(HMaster.java:3177)
~[classes/:?]
at org.apache.hadoop.hbase.master.HMaster.balance(HMaster.java:1954)
~[classes/:?]
at
org.apache.hadoop.hbase.master.MasterRpcServices.balance(MasterRpcServices.java:743)
~[classes/:?]
at
org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
~[hbase-protocol-shaded-3.0.0-alpha-4-SNAPSHOT.jar:3.0.0-alpha-4-SNAPSHOT]
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:385)
~[classes/:?]
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
~[classes/:?]
at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:104)
~[classes/:?]
at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:84)
~[classes/:?]
{noformat}
I think the problem is that, MasterStoppedException is a sub class of
DoNotRetryIOException, so when hitting this issue, we will fail immediately.
And the client zk syncer is asynchoronous, so it is possible that when we call
admin.balance, we haven't synced the new location yet, and it will throw the
MasterStoppedException out soon and fail the UT.
Let me see how to fix it.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)