Bharath Vissapragada created HBASE-23764:
--------------------------------------------
Summary: Flaky tests due to ZK client name resolution delays
Key: HBASE-23764
URL: https://issues.apache.org/jira/browse/HBASE-23764
Project: HBase
Issue Type: Bug
Components: test
Affects Versions: 3.0.0
Reporter: Bharath Vissapragada
Assignee: Bharath Vissapragada
[~ndimiduk] and I ran into this issue (separately) and we noticed that there
are some performance issues with name resolution in the Zookeeper client. Since
we use ZK heavily in the unit tests, this often manifests as the following
issues
1. Test time outs starting the mini cluster (Master failed to start....)
2. InterruptedException (because the tests timeout)
3. Flaky tests because a subset of the cluster fails to start for whatever
reason (replication tests especially because they spawn multiple clusters).
I have strong feeling that this is a possible cause for many of our flaky tests
in Jenkins. Luckily, it looks like the following workaround to switch to an IP
address instead of hostname makes it much quicker. There are some related
discussions in the ZK community (ZOOKEEPER-1666 and related jiras).
Until we figure out the actual root cause and a dependency upgrade (if needed),
we should consider making this hostname to IP switch for more stable builds.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)