Excerpts from Rainer Orth's message of September 3, 2025 10:20 am:
>>> 
>>> I regularly (but not always) see timeouts on Solaris, both on sparc and
>>> x86:
>>> 
>>> WARNING: libphobos.gc/forkgc2.d execution test program timed out.
>>> FAIL: libphobos.gc/forkgc2.d execution test
>>> WARNING: libphobos.gc/startbackgc.d execution test program timed out.
>>> FAIL: libphobos.gc/startbackgc.d execution test
> 
> I haven't tried investigating what's wrong on Solaris with those two,
> but they sure are annoying, especially since they are so unreliable:
> sometimes both PASS, sometimes one or the other, sometimes both.
> 
> I'd thought about skipping them on Solaris, too, just to avoid the noise
> and the timeouts, but haven't gotten around to that.
> 
> However, fixing this at the root would certainly be best.
> 

I currently have a gdb session on cfarm, process has hung for forkgc2, 
and just looking at the backtrace.

* There are 11 threads in total (main + 10 new'd Threads)
* All threads are suspended (in sigsuspend) except for two
* The first of those threads is the one that's requested all threads to 
  suspend using pthread_kill(SIGRTMIN), and is stuck inside a sem_wait 
  for one more call to sem_post().
* The second is stuck in a SpinLock.lock loop, called from 
  _prefork_handler() inside forkx() inside fork() - my guess would be 
  the  handler being called is _d_gcx_atfork_prepare().
* Specific to Solaris, I've clocked this line in the forkx 
  implementation:

https://github.com/illumos/illumos-gate/blob/a21856a054bd854f39d1d55a6b0d547cb0d2039f/usr/src/lib/libc/port/threads/scalls.c#L177

I think what's going on is that the thread that wants to do a GC 
collection has issued a signal to all threads, but Solaris has called 
sigoff() in the last thread being fork'd, so the signal never reaches.

This behaviour does not change when COLLECT_FORK is disabled, so Solaris 
would still be affected.

Iain.

Reply via email to