On Wed, Jan 16, 2008 at 09:15:50AM -0500, Chris Shoemaker wrote:
> Hi,
>
> I was occasionally having signals caught by libev not trigger any
> watchers. This turns out to be a another race condition in libev, but
> I'm not sure if the simple fix is correct.
>
> The race involves the static sig_atomic_t volatile gotsig.
>
> Here is the sequence of events that causes the signal to be lost:
>
> Assume there are two signal watchers started, and gotsig is 0.
>
> The first signal is received, triggering sighandler().
>
> if (!gotsig) {
> int old_errno = errno;
> gotsig = 1;
> write (sigpipe [1], &signum, 1);
> errno = old_errno;
> }
>
> The condition is true, and gotsig becomes 1. The signal handler returns.
>
> Normally, we will eventually detect the write to sigpipe, wake the sigev,
> calling sigcb, which will clear gotsig after reading from the pipe:
>
> static void sigcb (EV_P_ ev_io *iow, int revents) {
> ...
> read (sigpipe [0], &revents, 1);
> gotsig = 0;
> ...
>
> However, as soon as the sighandler returns, the full signal mask is
> removed, so a new signal may be received at any time. If a signal is
> received before sigcb() clears gotsig, the sighandler will not record
> it, because (!gotsig) will still be false.
>
> This stupid patch closes the race, and improves the reliability of
> signal delivery in my tests.
>
> @@ -792,7 +792,7 @@ sighandler (int signum)
>
> signals [signum - 1].gotsig = 1;
>
> - if (!gotsig)
> + if (1)
> {
> int old_errno = errno;
> gotsig = 1;
>
>
> But, I don't understand the motive for the flag in the first place, so
> this may be breaking something else, that I don't appreciate. Is there
> any problem with removing the variable altogether?
Does anybody have any insight into this? Things seem to work much better
for me when I remove the gotsig variable altogether. That obviously fixes
the race, but is there a downside?
-chris
_______________________________________________
libev mailing list
[email protected]
http://lists.schmorp.de/cgi-bin/mailman/listinfo/libev