wenbingshen opened a new pull request, #3418:
URL: https://github.com/apache/bookkeeper/pull/3418
### Motivation
fix Flaky-test:
`BookieZKExpireTest.testBookieServerZKSessionExpireBehaviour` by disable retry
session expired.
issue #3206 was fixed by PR #3415, but it still introduced another flaky
test: `BookieZKExpireTest.testBookieServerZKSessionExpireBehaviour`
According to the investigation, the reason is:
`ZookeeperClient` still creates a new `zookeeper` instance when the old
zookeeper client session time out.
Due to the asynchronous execution of two threads executing bookie temporary
node re-registration and zk instance re-creation, the test program sometimes
succeeds and sometimes fails.
1. When the temporary node re-registration is performed before the zk
re-instantiation, the temporary node creation will use the old zk instance,
which will cause a session timeout error, the bookie service will be shutdown,
and the test will be successful;
2. When the zk re-instantiation precedes the re-registration of the
temporary node, the temporary node creation will use the new re-instantiated zk
instance, then the temporary node will be successfully created, the bookie
service is running normally, and the test fails.
```java
try {
connectExecutor.submit(clientCreator);
} catch (RejectedExecutionException ree) {
if (!closed.get()) {
logger.error("ZooKeeper reconnect task is rejected : ", ree);
}
} catch (Exception t) {
logger.error("Failed to submit zookeeper reconnect task due to
runtime exception : ", t);
}
```
### Changes
Add a `retryExpired` flag to indicate whether to run the zk instance and
retry to create a new instance after the session times out.
Set this flag to false for `ZKMetadataBookieDriver`;
Other ZookeeperClient and normal ZookeeperClient applications will generate
the default value true or set to true, which is consistent with the original
behavior.
### Test the behavior of this PR:
**Before this PR:**
Executed the test 10 times, all failed.
<img width="1111" alt="image"
src="https://user-images.githubusercontent.com/35599757/180596967-9f3f4300-6ba6-4989-b35e-a275a955d139.png">
**After this PR:**
Executed the test 10 times, all successful.
<img width="894" alt="image"
src="https://user-images.githubusercontent.com/35599757/180597032-01318ed6-e498-4ba4-8bd7-fb081ed8245a.png">
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]