He-Pin opened a new pull request, #3017:
URL: https://github.com/apache/pekko/pull/3017

   ## Motivation
   
   `MixedProtocolClusterSpec` "be allowed to join a cluster with a node using 
the pekko protocol (udp)" still fails on virtualized JDK 25 runs after #2997 
reordered `shutdownAll` to stop joining nodes first:
   
   ```
   [WARN] CoordinatedShutdown(pekko://MixedProtocolClusterSpec) Coordinated
   shutdown phase [actor-system-terminate] timed out after 30000 milliseconds
   java.lang.RuntimeException: Failed to stop [MixedProtocolClusterSpec] within 
[1 minute]
    ... StreamSupervisor ... remote-6-0-unnamed ActorGraphInterpreter
   ```
   
   The `"within [1 minute]"` outer await is a `30s` base dilated by 
`pekko.test.timefactor` — i.e. this lane runs at **tf=2** (`30s × 2 = 60s`). 
The JDK 25 nightly runs at **tf=4** → `120s` and passes.
   
   The `actor-system-terminate` phase only calls `system.finalTerminate()` and 
**recovers** on its own (non-dilated) phase timeout while termination keeps 
draining in the background 
([`CoordinatedShutdown.scala:264-269`](https://github.com/apache/pekko/blob/main/actor/src/main/scala/org/apache/pekko/actor/CoordinatedShutdown.scala#L264-L269)).
 So the inner-phase WARN is **non-binding noise** — 
`ClusterTestUtil.shutdownAll`'s dilated await on `whenTerminated` is the real 
pass/fail deadline. The aeron-udp transport is the slowest to drain (embedded 
media driver + stacked Aeron liveness timeouts), so `60s` was simply too tight 
at tf=2.
   
   ## Modification
   
   - **`ClusterTestUtil.shutdownAll`**: raise the outer await base `30s → 60s` 
(the binding, timefactor-dilated deadline), so a tf=2 lane gets ~`120s` — the 
same headroom the tf=4 nightly already passes with. Added a comment explaining 
why this await, not the inner phase, governs pass/fail.
   - **`MixedProtocolClusterSpec` baseConfig**: raise the (non-dilated, 
non-binding) `actor-system-terminate` phase timeout `30s → 60s` to suppress the 
spurious WARN on the slow path and align it with the new await base.
   
   This is a follow-up to #2997 — that PR pulled the shutdown-*ordering* lever; 
this one fixes the binding *outer-await* deadline.
   
   ## Result
   
   aeron-udp cluster systems get enough wall-clock to terminate cleanly on 
lower-timefactor virtualized lanes without the shutdown-phase abort. Healthy 
shutdowns still complete in well under a second, so local and normal CI runs 
are unaffected. **Test-only change** — no production behaviour or 
binary-compatibility impact.
   
   ## Tests
   
   - `sbt "cluster/Test/compile"` — success (cluster test-classes compiled)
   - `scalafmt 3.10.7` on both changed files — no reformatting needed
   - `git diff --check` — clean
   - aeron-udp shutdown timing is timefactor/environment dependent and does not 
reproduce on local runs (shutdown completes <1s); the change is a timeout 
widening verified by compile + format.
   
   ## References
   
   - Follow-up to #2997 (reverse-order cluster shutdown)
   - `nightly-builds.yml` `MixedProtocolClusterSpec` (udp) shutdown timeout
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to