oneby-wang opened a new pull request, #25864:
URL: https://github.com/apache/pulsar/pull/25864

   ### Motivation
   
   `ProducerCleanupTest#testAllTimerTaskShouldCanceledAfterProducerClosed` is 
flaky because the test asserted `HashedWheelTimer.pendingTimeouts()` on the 
shared test client immediately after producer close. The producer close path 
cancels the send timeout and producer stats timeout, but Netty's 
`HashedWheelTimer` removes cancelled timeouts asynchronously in the timer 
worker. Under load, `pendingTimeouts()` can still report the cancelled tasks 
for a short period.
   
   Example failure:
   
   ```text
   2026-05-23T08:07:31,286 - INFO  - [BookieDeathWatcher-53904:BookieServer] - 
BookieDeathWatcher exited loop due to uncaught exception from thread 
BookieDeathWatcher-53904 {}
   java.lang.RuntimeException: Bookie is not running any more
        at 
org.apache.bookkeeper.proto.BookieServer$DeathWatcher.run(BookieServer.java:289)
   2026-05-23T08:07:31,286 - INFO  - [BookieDeathWatcher-53904:BookieServer] - 
Shutting down BookieServer {}
   2026-05-23T08:07:31,286 - INFO  - 
[BookieDeathWatcher-53904:BookieNettyServer] - Shutting down BookieNettyServer 
{}
   
   Expected :0
   Actual   :2
   
   java.lang.AssertionError: expected [0] but found [2]
        at org.testng.Assert.fail(Assert.java:111)
        at org.testng.Assert.failNotEquals(Assert.java:1590)
        at org.testng.Assert.assertEqualsImpl(Assert.java:150)
        at org.testng.Assert.assertEquals(Assert.java:132)
        at org.testng.Assert.assertEquals(Assert.java:992)
        at org.testng.Assert.assertEquals(Assert.java:956)
        at org.testng.Assert.assertEquals(Assert.java:1002)
        at 
org.apache.pulsar.client.api.ProducerCleanupTest.testAllTimerTaskShouldCanceledAfterProducerClosed(ProducerCleanupTest.java:40)
   ```
   
   ### Modifications
   
   - Use a dedicated `PulsarClient` in the test so the timer assertion only 
observes tasks created by this test.
   - Set the producer send timeout to 15 seconds. This keeps the timeout task 
in the initial state long enough for `producer.close()` to exercise the 
cancellation path.
   - Replace the fixed `Thread.sleep(2000)` and one-shot assertion with 
Awaitility, waiting up to 10 seconds for Netty's timer worker to clean up 
cancelled timeouts.
   
   ### Verifying this change
   
   - [x] Make sure that the change passes the CI checks.
   
   <!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->
   
   *If the box was checked, please highlight the changes*
   
   - [ ] Dependencies (add or upgrade a dependency)
   - [ ] The public API
   - [ ] The schema
   - [ ] The default values of configurations
   - [ ] The threading model
   - [ ] The binary protocol
   - [ ] The REST endpoints
   - [ ] The admin CLI options
   - [ ] The metrics
   - [ ] Anything that affects deployment
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to