Excerpts from Rainer Orth's message of September 3, 2025 10:20 am: >>> >>> I regularly (but not always) see timeouts on Solaris, both on sparc and >>> x86: >>> >>> WARNING: libphobos.gc/forkgc2.d execution test program timed out. >>> FAIL: libphobos.gc/forkgc2.d execution test >>> WARNING: libphobos.gc/startbackgc.d execution test program timed out. >>> FAIL: libphobos.gc/startbackgc.d execution test > > I haven't tried investigating what's wrong on Solaris with those two, > but they sure are annoying, especially since they are so unreliable: > sometimes both PASS, sometimes one or the other, sometimes both. > > I'd thought about skipping them on Solaris, too, just to avoid the noise > and the timeouts, but haven't gotten around to that. > > However, fixing this at the root would certainly be best. >
I currently have a gdb session on cfarm, process has hung for forkgc2, and just looking at the backtrace. * There are 11 threads in total (main + 10 new'd Threads) * All threads are suspended (in sigsuspend) except for two * The first of those threads is the one that's requested all threads to suspend using pthread_kill(SIGRTMIN), and is stuck inside a sem_wait for one more call to sem_post(). * The second is stuck in a SpinLock.lock loop, called from _prefork_handler() inside forkx() inside fork() - my guess would be the handler being called is _d_gcx_atfork_prepare(). * Specific to Solaris, I've clocked this line in the forkx implementation: https://github.com/illumos/illumos-gate/blob/a21856a054bd854f39d1d55a6b0d547cb0d2039f/usr/src/lib/libc/port/threads/scalls.c#L177 I think what's going on is that the thread that wants to do a GC collection has issued a signal to all threads, but Solaris has called sigoff() in the last thread being fork'd, so the signal never reaches. This behaviour does not change when COLLECT_FORK is disabled, so Solaris would still be affected. Iain.