Hi Neil, > NB I have an orthogonal concern here (i.e. possibly yet another issue > with the current code!): If a thread that was running in Guile mode > called scm_leave_guile() to do non-guile stuff for a while, and is > still outside Guile mode, shouldn't it have been removed from the > all_threads list when it called scm_leave_guile()? > > (But please feel free to ignore this one for now. We already have > enough loose ends in the air!)
Ignoring... (but is that what all_threads is for? My understanding was that it was for ALL threads created by / initialized to use Guile -- i.e., all threads that needed to be GC'd.) > I think you need to check !wake_up_flag at the start of the loop. I > think it is possible that when the loop starts, the GC thread has > already set wake_up_flag to 1 and signalled wake_up_cond. I don't think this is possible -- the GC thread could never have gotten to that point unless it had locked the non-GC thread's heap_mutex. By the time it sets wake_up_flag to 1, it must also be holding the wake_up_mutex, which means that the non-GC thread had already relinquished it via the cond_wait. > Do the locks of t->heap_mutex and wake_up_mutex really need to overlap > like this? I think this could lead to a deadlock: at [A] above, the > non-GC thread holds wake_up_mutex and tries to lock t->heap_mutex, > whereas at [B] above, the GC thread holds t->heap_mutex (for every > thread) and tries to lock wake_up_mutex. > > If there isn't a hard reason for the overlapping, the two lock/unlock > pairs just above can be swapped, and that eliminates the deadlock > possibility. I think that they do (need to overlap). And I'm having a hard time seeing the potential for deadlock here (maybe I'm just sluggish from the heat in my cubicle). I think the order of locking is critical to preventing deadlock, in fact, via a race on wake_up_flag. In the situation you describe, the non-GC thread will only be able to seize wake_up_mutex once the wake_up_flag has been set and the GC thread has permanently relinquished wake_up_mutex for that round of collection, so there's no deadlock. Am I missing something? Regards. Julian