Hi Ryan,
Adding the contract for
ForkJoinWorkerThreadFactory.newThread(ForkJoinPool) to the conversation
(emphasis mine):
«Returns a new worker thread operating in the given pool. *Returning
null or throwing an exception may result in tasks never being executed.
*If this method throws an exception, it is relayed to the caller of the
method (for example execute) causing attempted thread creation. If this
method returns null or throws an exception, it is not retried until the
next attempted creation (for example another call to execute).» -
https://docs.oracle.com/en/java/javase/25/docs/api/java.base/java/util/concurrent/ForkJoinPool.ForkJoinWorkerThreadFactory.html#newThread(java.util.concurrent.ForkJoinPool)
On 2026-03-12 00:07, Ryan Ernst wrote:
Hi,
I'd like to discuss a (semi-fixed) bug in ForkJoinPool where tasks are silently
lost when the ForkJoinWorkerThreadFactory throws an exception.
The first call to ForkJoinPool.execute() correctly propagates a factory
exception. However, subsequent execute() calls silently queue the task. No
exception is thrown and the task never runs; it's effectively lost forever.
The root cause is in WorkQueue.push(), it only calls signalWork() when the
previous queue slot is null (ie the queue appeared empty). After the first
failed execute(), the unconsumed task remains in slot 0. The next push() sees a
non-null slot, skips signaling, and never attempts to create a worker.
The bug was inadvertently fixed in JDK 23 by the large ForkJoinPool rewrite in
JDK-8322732, and I've confirmed it does not reproduce on JDK 23+. As far as I
can tell, the fix has not been backported to 17u or 21u (it reproduces on JDK
17.0.13 and 21.0.4).
The fix is a one-line change in WorkQueue.push(), checking one slot further
back when deciding whether to signal. This would (roughly) match the logic in
JDK 23+.
In JDK 21:
- if ((resize || (a[m & (s - 1)] == null && signalIfEmpty)) &&
+ if ((resize || a[m & (s - 2)] == null && signalIfEmpty) &&
pool != null)
pool.signalWork();
I have a standalone reproduction case I'm happy to share, but I wanted to check
if this is a change that would be worthwhile and/or accepted as a bugfix in
those LTS versions.
Thanks
Ryan
--
Cheers,
√
Viktor Klang
Software Architect, Java Platform Group
Oracle