I ran into the following problem on my setup (LRP 2.9.4 running 
diald-99-1 and kernel 2.2.11):

diald creates a ppp link, but fails to set up a default route to the 
interface.  

I believe that a race condition is preventing diald from setting up 
routing to PPP interface.  It appears that run_shell is waiting for 
ifconfig--which has already died--to terminate.  In effect, diald 
enters a state of suspended animation until the next child process 
terminates (e.g., when pppd terminates after a long period of link 
inactivity).

diald logs show it doing nothing for about half an hour until pppd 
terminates:

Oct  7 02:51:35 myrouter2 diald[511]: running '/sbin/ifconfig ppp0 
209.152.199.80 pointopoint 209.152.199.2 broadcast 0.0.0.0 
netmask 255.255.255.255 metric 0 mtu 1500 up'
Oct  7 02:51:36 myrouter2 diald[511]: start ppp0: 
SIOCSIFMETRIC: Operation not supported 
Oct  7 03:24:14 myrouter2 pppd[536]: Hangup (SIGHUP)
Oct  7 03:24:14 myrouter2 pppd[536]: Modem hangup
Oct  7 03:24:14 myrouter2 pppd[536]: Connection terminated.
Oct  7 03:24:15 myrouter2 pppd[536]: Exit.

diald handles death of child at 3:24:15, a good half hour after ifconfig actually 
terminated--

Oct  7 03:24:15 myrouter2 diald[511]: SIGCHLD[5]: pid 536 link, status 256
Oct  7 03:24:15 myrouter2 diald[511]: SIGCHLD[6]: pid 539 system, status 256

Tracing diald suggests that diald is having problems managing its 
children:

strace shows diald receiving signal at 2:51:36

511 <diald>  02:51:36 sigprocmask(SIG_UNBLOCK, [HUP INT 
USR1 USR2 PIPE TERM CHLD], NULL) = 0
511 <diald>   02:51:36 --- SIGCHLD (Child exited) ---
511 <diald>   02:51:36 wait4(-1,  <unfinished ...>

...but the signal is not handled by diald until 3:24:15.

511 <diald>   03:24:15 <... wait4 resumed> [WIFEXITED(s) && 
WEXITSTATUS(s) == 1], WNOHANG, NULL) = 536

Now, I'm new to signals, but I wonder if the following might be 
happening in run_shell:  

1. diald blocks all signals
2. a shell is forked off to run ifconfig
3. ifconfig terminates; diald does not receive SIGCHLD because 
signals are blocked
4. diald unblocks all signals and immediately receives SIGCHLD
5. diald calls pause() and hangs waiting for a signal which it has 
already received

Here is the relevant section from the source code (file shell.c):

    if (mode & SHELL_WAIT) {
        running_pid = pid;

        if (p[0] >= 0 && (fd = fdopen(p[0], "r"))) {
            char buf[1024];

            while (fgets(buf, sizeof(buf)-1, fd)) {
                buf[sizeof(buf)-1] = '\0';
                mon_syslog(LOG_INFO, "%s: %s", name, buf);
            }

            fclose(fd);
        }

        unblock_signals();

        while (running_pid)
            pause();
        return running_status;
    }



-
To unsubscribe from this list: send the line "unsubscribe linux-diald" in
the body of a message to [EMAIL PROTECTED]

Reply via email to