I ran into the following problem on my setup (LRP 2.9.4 running
diald-99-1 and kernel 2.2.11):
diald creates a ppp link, but fails to set up a default route to the
interface.
I believe that a race condition is preventing diald from setting up
routing to PPP interface. It appears that run_shell is waiting for
ifconfig--which has already died--to terminate. In effect, diald
enters a state of suspended animation until the next child process
terminates (e.g., when pppd terminates after a long period of link
inactivity).
diald logs show it doing nothing for about half an hour until pppd
terminates:
Oct 7 02:51:35 myrouter2 diald[511]: running '/sbin/ifconfig ppp0
209.152.199.80 pointopoint 209.152.199.2 broadcast 0.0.0.0
netmask 255.255.255.255 metric 0 mtu 1500 up'
Oct 7 02:51:36 myrouter2 diald[511]: start ppp0:
SIOCSIFMETRIC: Operation not supported
Oct 7 03:24:14 myrouter2 pppd[536]: Hangup (SIGHUP)
Oct 7 03:24:14 myrouter2 pppd[536]: Modem hangup
Oct 7 03:24:14 myrouter2 pppd[536]: Connection terminated.
Oct 7 03:24:15 myrouter2 pppd[536]: Exit.
diald handles death of child at 3:24:15, a good half hour after ifconfig actually
terminated--
Oct 7 03:24:15 myrouter2 diald[511]: SIGCHLD[5]: pid 536 link, status 256
Oct 7 03:24:15 myrouter2 diald[511]: SIGCHLD[6]: pid 539 system, status 256
Tracing diald suggests that diald is having problems managing its
children:
strace shows diald receiving signal at 2:51:36
511 <diald> 02:51:36 sigprocmask(SIG_UNBLOCK, [HUP INT
USR1 USR2 PIPE TERM CHLD], NULL) = 0
511 <diald> 02:51:36 --- SIGCHLD (Child exited) ---
511 <diald> 02:51:36 wait4(-1, <unfinished ...>
...but the signal is not handled by diald until 3:24:15.
511 <diald> 03:24:15 <... wait4 resumed> [WIFEXITED(s) &&
WEXITSTATUS(s) == 1], WNOHANG, NULL) = 536
Now, I'm new to signals, but I wonder if the following might be
happening in run_shell:
1. diald blocks all signals
2. a shell is forked off to run ifconfig
3. ifconfig terminates; diald does not receive SIGCHLD because
signals are blocked
4. diald unblocks all signals and immediately receives SIGCHLD
5. diald calls pause() and hangs waiting for a signal which it has
already received
Here is the relevant section from the source code (file shell.c):
if (mode & SHELL_WAIT) {
running_pid = pid;
if (p[0] >= 0 && (fd = fdopen(p[0], "r"))) {
char buf[1024];
while (fgets(buf, sizeof(buf)-1, fd)) {
buf[sizeof(buf)-1] = '\0';
mon_syslog(LOG_INFO, "%s: %s", name, buf);
}
fclose(fd);
}
unblock_signals();
while (running_pid)
pause();
return running_status;
}
-
To unsubscribe from this list: send the line "unsubscribe linux-diald" in
the body of a message to [EMAIL PROTECTED]