oneby-wang opened a new pull request, #25957:
URL: https://github.com/apache/pulsar/pull/25957

   ### Motivation
   
   `AuditorLedgerCheckerTest.testDelayedAuditOfLostBookies` is flaky when 
repeated with a high invocation count. The test configures 
`lostBookieRecoveryDelay` to 5 seconds, shuts down a non-auditor bookie, and 
then uses fixed waits that start immediately after the shutdown thread is 
launched.
   
   ```
   audit of lost bookie isn't delayed
   java.lang.AssertionError: audit of lost bookie isn't delayed
        at org.testng.AssertJUnit.fail(AssertJUnit.java:65)
        at org.testng.AssertJUnit.assertTrue(AssertJUnit.java:23)
        at 
org.apache.bookkeeper.replication.AuditorLedgerCheckerTest.testInnerDelayedAuditOfLostBookies(AuditorLedgerCheckerTest.java:415)
        at 
org.apache.bookkeeper.replication.AuditorLedgerCheckerTest.testDelayedAuditOfLostBookies(AuditorLedgerCheckerTest.java:436)
        at 
java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
        at java.base/java.lang.reflect.Method.invoke(Method.java:565)
        at 
org.testng.internal.invokers.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:141)
        at 
org.testng.internal.invokers.InvokeMethodRunnable.runOne(InvokeMethodRunnable.java:47)
        at 
org.testng.internal.invokers.InvokeMethodRunnable.call(InvokeMethodRunnable.java:76)
        at 
org.testng.internal.invokers.InvokeMethodRunnable.call(InvokeMethodRunnable.java:11)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:328)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1090)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614)
        at java.base/java.lang.Thread.run(Thread.java:1474)
   ```
   
   The test was measuring the delay from the moment it started a thread to shut 
down a bookie. However, `lostBookieRecoveryDelay` starts only after the auditor 
observes the lost bookie and schedules the delayed audit task.
   
   ### Modifications
   
   - Wait until the auditor has scheduled the delayed `auditTask` before 
starting the delay assertions.
   - Keep the negative assertion anchored to the configured delay window, 
verifying that the ledger is not marked under-replicated before the delay 
expires.
   - Use a short grace period after the delay window for the scheduled audit to 
run and for the under-replication watcher to observe the result.
   - Add a helper to wait for the delayed audit task to be scheduled without 
triggering an audit directly.
   
   ### Verifying this change
   
   - [x] Make sure that the change passes the CI checks.
   
   ### Does this pull request potentially affect one of the following parts:
   
   - [ ] Dependencies (add or upgrade a dependency)
   - [ ] The public API
   - [ ] The schema
   - [ ] The default values of configurations
   - [ ] The threading model
   - [ ] The binary protocol
   - [ ] The REST endpoints
   - [ ] The admin CLI options
   - [ ] The metrics
   - [ ] Anything that affects deployment
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to