[ https://issues.apache.org/jira/browse/HBASE-27169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Duo Zhang resolved HBASE-27169. ------------------------------- Hadoop Flags: Reviewed Resolution: Fixed It is much more stable than before but still have some failure runs. https://ci-hbase.apache.org/job/HBase-Flaky-Tests/job/master/3818/testReport/junit/org.apache.hadoop.hbase.client/TestSeparateClientZKCluster/testMetaMoveDuringClientZkClusterRestart/ Will open other issues for fixing. > TestSeparateClientZKCluster is flaky > ------------------------------------ > > Key: HBASE-27169 > URL: https://issues.apache.org/jira/browse/HBASE-27169 > Project: HBase > Issue Type: Bug > Components: test > Reporter: Duo Zhang > Assignee: Duo Zhang > Priority: Major > Fix For: 2.5.0, 3.0.0-alpha-4, 2.4.14 > > > https://nightlies.apache.org/hbase/HBase-Flaky-Tests/master/3773/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.client.TestSeparateClientZKCluster-output.txt > {noformat} > org.apache.hadoop.hbase.exceptions.MasterStoppedException: null > at > org.apache.hadoop.hbase.master.HMaster.checkInitialized(HMaster.java:3177) > ~[classes/:?] > at org.apache.hadoop.hbase.master.HMaster.balance(HMaster.java:1954) > ~[classes/:?] > at > org.apache.hadoop.hbase.master.MasterRpcServices.balance(MasterRpcServices.java:743) > ~[classes/:?] > at > org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java) > ~[hbase-protocol-shaded-3.0.0-alpha-4-SNAPSHOT.jar:3.0.0-alpha-4-SNAPSHOT] > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:385) > ~[classes/:?] > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) > ~[classes/:?] > at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:104) > ~[classes/:?] > at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:84) > ~[classes/:?] > {noformat} > I think the problem is that, MasterStoppedException is a sub class of > DoNotRetryIOException, so when hitting this issue, we will fail immediately. > And the client zk syncer is asynchoronous, so it is possible that when we > call admin.balance, we haven't synced the new location yet, and it will throw > the MasterStoppedException out soon and fail the UT. > Let me see how to fix it. -- This message was sent by Atlassian Jira (v8.20.10#820010)