Hi,

I'd like to discuss a (semi-fixed) bug in ForkJoinPool where tasks are silently 
lost when the ForkJoinWorkerThreadFactory throws an exception.

The first call to ForkJoinPool.execute() correctly propagates a factory 
exception. However, subsequent execute() calls silently queue the task. No 
exception is thrown and the task never runs; it's effectively lost forever.

The root cause is in WorkQueue.push(), it only calls signalWork() when the 
previous queue slot is null (ie the queue appeared empty). After the first 
failed execute(), the unconsumed task remains in slot 0. The next push() sees a 
non-null slot, skips signaling, and never attempts to create a worker.

The bug was inadvertently fixed in JDK 23 by the large ForkJoinPool rewrite in 
JDK-8322732, and I've confirmed it does not reproduce on JDK 23+. As far as I 
can tell, the fix has not been backported to 17u or 21u (it reproduces on JDK 
17.0.13 and 21.0.4).

The fix is a one-line change in WorkQueue.push(), checking one slot further 
back when deciding whether to signal. This would (roughly) match the logic in 
JDK 23+.

In JDK 21:
  -  if ((resize || (a[m & (s - 1)] == null && signalIfEmpty)) &&
  +  if ((resize || a[m & (s - 2)] == null && signalIfEmpty) &&
          pool != null)
          pool.signalWork();

I have a standalone reproduction case I'm happy to share, but I wanted to check 
if this is a change that would be worthwhile and/or accepted as a bugfix in 
those LTS versions.

Thanks
Ryan

Reply via email to