Duo Zhang created HBASE-27169:
---------------------------------

             Summary: TestSeparateClientZKCluster is flaky
                 Key: HBASE-27169
                 URL: https://issues.apache.org/jira/browse/HBASE-27169
             Project: HBase
          Issue Type: Bug
          Components: test
            Reporter: Duo Zhang


https://nightlies.apache.org/hbase/HBase-Flaky-Tests/master/3773/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.client.TestSeparateClientZKCluster-output.txt

{noformat}
org.apache.hadoop.hbase.exceptions.MasterStoppedException: null
        at 
org.apache.hadoop.hbase.master.HMaster.checkInitialized(HMaster.java:3177) 
~[classes/:?]
        at org.apache.hadoop.hbase.master.HMaster.balance(HMaster.java:1954) 
~[classes/:?]
        at 
org.apache.hadoop.hbase.master.MasterRpcServices.balance(MasterRpcServices.java:743)
 ~[classes/:?]
        at 
org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
 ~[hbase-protocol-shaded-3.0.0-alpha-4-SNAPSHOT.jar:3.0.0-alpha-4-SNAPSHOT]
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:385) 
~[classes/:?]
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) 
~[classes/:?]
        at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:104) 
~[classes/:?]
        at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:84) 
~[classes/:?]
{noformat}

I think the problem is that, MasterStoppedException is a sub class of 
DoNotRetryIOException, so when hitting this issue, we will fail immediately.

And the client zk syncer is asynchoronous, so it is possible that when we call 
admin.balance, we haven't synced the new location yet, and it will throw the 
MasterStoppedException out soon and fail the UT.

Let me see how to fix it. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to