He-Pin opened a new pull request, #3007:
URL: https://github.com/apache/pekko/pull/3007
### Motivation
JDK 25 nightly stream tests hang for the full test timeout (the recurring
failures behind #2573 / #2870). A local reproduction — the full `stream-tests`
run with the nightly `virtualize=on` + `timefactor=4` options on JDK 25 — pins
it down:
- one `...-pekko.test.stream-dispatcher-CarrierThread-N` consumes **~97%
CPU** (cpu time ≈ elapsed time) stuck in `AbstractNodeQueue.pollNode`,
- every other carrier is idle in `ForkJoinPool.awaitWork`,
- a full virtual-thread dump shows **no producer thread anywhere**.
The spinning consumer is a virtual thread, so the unbounded CPU spin pins
its carrier permanently; the stream never progresses and the test's
`futureValue` never completes. Every affected test passes in isolation (~100ms)
even with the full nightly JVM options — the hang only appears under sustained
load, because it is a JIT-state-dependent data race.
### Root cause
PR #1990 (*avoid sun.misc.Unsafe by using VarHandles*) mapped the **producer
writes** correctly (`Unsafe.putOrderedObject` → `VarHandle.setRelease`) but
**downgraded every consumer read** from `Unsafe.getObjectVolatile` (a
volatile/acquire load) to `VarHandle.get` — which has **plain** memory
semantics even when the field is declared `volatile`.
A plain read is not ordered against the producer's release store, so it
establishes no happens-before with the published node. Inside the busy-spin
loops in `peekNode`/`pollNode`:
```java
do { next = tail.next(); } while (next == null);
```
the JIT may hoist the plain load out of the loop, producing an unbounded
spin that never observes the linked next node. JDK 25's C2 makes this manifest
reliably, and virtual-thread carriers turn the transient spin into a permanent
100% CPU pin.
This is the same memory ordering that lock-free MPSC queues such as JCTools
use (consumer-side `lvNext` / load-acquire on the next pointer); the original
Unsafe code matched it, and #1990 inadvertently broke it.
### Modification
- `Node.next()` and the four `tailHandle` reads (`peekNode`, `pollNode`,
`isEmpty`, `count`) now use `getAcquire`, restoring the volatile-read semantics
the code had before #1990 and pairing with the existing `setRelease` writes.
- Added `Thread.onSpinWait()` to both busy-spin loops as standard spin-wait
hygiene.
### Performance
This **restores** the pre-#1990 memory semantics rather than adding new cost:
| Arch | plain `get` (since #1990) | `getAcquire` (this PR / original Unsafe
`getObjectVolatile`) |
|------|---------------------------|-------------------------------------------------------------|
| x86-64 | `MOV` | `MOV` (all x86 loads already have acquire semantics) —
**zero difference** |
| AArch64 | `LDR` | `LDAR` (single instruction) — **same as pre-#1990** |
Net effect versus the original Unsafe-based design is zero on x86-64 and
negligible on AArch64; it only removes the broken plain-read micro-optimization
the VarHandle migration introduced. Method signatures are unchanged → no
binary-compatibility impact.
### Result
With the fix, the previously hanging `HubSpec "work with long streams if one
of the producers is slower"` completes in ~2.7s (was stuck for the full
timeout), and the full `stream-tests` run proceeds past the point where it
previously hung (1800+ tests, no hang) under the same nightly `virtualize=on` +
`timefactor=4` JVM options on JDK 25.
### Tests
- `sbt actor/compile` succeeds.
- Local full `stream-tests` run with nightly JVM options (`virtualize=on`,
`minimum-runnable=8`, `timefactor=4`) on JDK 25.0.1 no longer hangs at HubSpec
and proceeds normally; the specific previously-hanging test now passes in ~2.7s.
- Signatures unchanged, so MiMa is unaffected.
### References
- https://github.com/apache/pekko/issues/2870
- https://github.com/apache/pekko/issues/2573
- Regression introduced in #1990
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]