2008/5/25 Julian Graham <[EMAIL PROTECTED]>:

> Hi everyone,
>
> While I was testing and debugging some of the SRFI-18 code that Neil
> and I were working on, I noticed a deadlock that happens in
> scm_join_thread_timed.  I'm pretty sure it affects the 1.8 codebase as
> well, although it's probably more common when doing timed joins.
>
> Thread joining in Guile (1.9 or 1.8) works as follows:
>
> 1. If the target thread has exited, return.
> 2. Block on the target thread's join queue.
> 3. When woken (because of a pthread_cond_signal, a spurious pthreads
> wakeup, or, in 1.9, a timeout expiration), check the target thread's
> exit status -- if it has exited, return.
> 4. Otherwise, SCM_TICK.
> 5. Go to step 2.
>
> The deadlock can happen if the thread exits during the tick, because
> there's no check of the exit status before block_self is called again.
>  I'm pretty sure that moving step 1 into the beginning of the loop
> would fix this --  I can submit a patch against 1.8, 1.9, or both.
> Let me know what you guys would like.
>

Hi Julian,

Based on the synopsis above, I agree that moving step 1 inside the loop
should fix this.  In addition, though, I think it would be very good if we
could add a minimal test that currently reproduces the deadlock, and so will
serve to guard against future regressions here.  Do you have such a test?

No need for a patch against both 1.8 and 1.9; just one will do, and git
cherry-pick will handle the other for us (unless the fix is significantly
different in the two branches).

Regards,
       Neil



>
> Regards,
> Julian
>
>

Reply via email to