This is an automated email from the ASF dual-hosted git repository. He-Pin pushed a commit to branch fix/jdk25-nodequeue-acquire-spin in repository https://gitbox.apache.org/repos/asf/pekko.git
commit 358f53ad427ac9b8cbf92b9caba1ecfd67276595 Author: He-Pin <[email protected]> AuthorDate: Fri May 29 17:52:18 2026 +0800 fix: pair AbstractNodeQueue next read with acquire semantics Motivation: JDK 25 nightly stream tests hang for the full test timeout. A local reproduction (full stream-tests with the nightly virtualize=on + timefactor=4 options) shows one `...-pekko.test.stream-dispatcher-CarrierThread-N` consuming ~97% CPU (cpu time approximately equal to elapsed time) stuck in `AbstractNodeQueue.pollNode`, while every other carrier is idle in `ForkJoinPool.awaitWork` and a full virtual-thread dump shows no producer thread anywhere. The spinning consumer is a virtual thread, so the unbounded CPU spin pins its carrier permanently; the stream never progresses and the test's `futureValue` never completes. Root cause: the MPSC queue publishes the linked node via `Node.setNext` = `nextHandle.setRelease(...)` (release store), but the consumer reads `Node.next()` = `nextHandle.get(...)`, which is a plain load — `VarHandle.get` has plain memory semantics even though the field is declared `volatile`. A plain read is not ordered against the producer's release store, so it establishes no happens-before with the published node, and inside the busy-spin loops in `peekNode`/`pollNode` (`do { next = tail.next(); } while (next == null);`) the JIT may hoist the plain load out of the loop. The result is an unbounded spin that never observes the linked next node. JDK 25's C2 makes this manifest reliably, and virtual-thread carriers turn the transient spin into a permanent 100% CPU pin. Modification: - `Node.next()` now uses `nextHandle.getAcquire(this)`, pairing with the `setRelease` in `setNext`. This establishes the missing happens-before and prevents the JIT from hoisting the load out of the spin loops. - Added `Thread.onSpinWait()` to both busy-spin loops (`peekNode`, `pollNode`) as standard spin-wait hygiene. Method signatures are unchanged, so there is no binary-compatibility impact. Result: With the fix, the previously hanging `HubSpec "work with long streams if one of the producers is slower"` completes in ~2.7s (was stuck for the full timeout) and the full stream-tests run proceeds past the point where it previously hung, under the same nightly virtualize=on + timefactor=4 JVM options on JDK 25. References: https://github.com/apache/pekko/issues/2870 --- .../java/org/apache/pekko/dispatch/AbstractNodeQueue.java | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/actor/src/main/java/org/apache/pekko/dispatch/AbstractNodeQueue.java b/actor/src/main/java/org/apache/pekko/dispatch/AbstractNodeQueue.java index ac8921e155..b339d1e1f3 100644 --- a/actor/src/main/java/org/apache/pekko/dispatch/AbstractNodeQueue.java +++ b/actor/src/main/java/org/apache/pekko/dispatch/AbstractNodeQueue.java @@ -60,6 +60,7 @@ public abstract class AbstractNodeQueue<T> extends AtomicReference<AbstractNodeQ // if tail != head this is not going to change until producer makes progress // we can avoid reading the head and just spin on next until it shows up do { + Thread.onSpinWait(); next = tail.next(); } while (next == null); } @@ -168,6 +169,7 @@ public abstract class AbstractNodeQueue<T> extends AtomicReference<AbstractNodeQ // if tail != head this is not going to change until producer makes progress // we can avoid reading the head and just spin on next until it shows up do { + Thread.onSpinWait(); next = tail.next(); } while (next == null); } @@ -208,7 +210,14 @@ public abstract class AbstractNodeQueue<T> extends AtomicReference<AbstractNodeQ } public final Node<T> next() { - return (Node<T>) nextHandle.get(this); + // Acquire load to pair with the release store in setNext. A plain read here + // (VarHandle.get has plain semantics even though the field is volatile) is not + // ordered against the producer's setRelease, so it establishes no happens-before + // with the published node and, inside the busy-spin loops in peekNode/pollNode, + // can be hoisted out of the loop by the JIT, producing an unbounded spin that + // never observes the linked next node. This was observed on JDK 25 where such a + // spin pinned a virtual-thread carrier at 100% CPU and stalled the stream. + return (Node<T>) nextHandle.getAcquire(this); } protected final void setNext(final Node<T> newNext) { --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
