Denys Vlasenko wrote: > On Monday 15 October 2007 15:15, Alexander Kriegisch wrote: > >> BTW, the output "ps" output diff looks like this on i386: >> >> $ diff -U0 ps1.txt ps2.txt >> --- ps1.txt 2007-10-15 16:13:06.000000000 +0200 >> +++ ps2.txt 2007-10-15 16:13:25.000000000 +0200 >> @@ -130 +129,0 @@ >> -root 31704 31643 0 16:10 ? 00:00:00 [login] <defunct> >> > > Aha. You have a login which does not exec shell, it spawns it > as a child. When shell exits, login exits too. > > What looks strange to me is that you see a _zombie_ login. > It means that it exited, but is not waited for yet. > > How come telnetd doesn't see EOF from login's fd? It *exited*, > and that implicitly closes all fds! I'm puzzled. > telnetd doesn't see EOF on the pty side because the client pty is open by a child of the shell, not the shell itself.
The reason why the process is a zombie is that telnetd will only wait for the child after sending SIGKILL, and it will only send SIGKILL after either the network connection or the client side of the pty is closed. But in this case the problem is that the client pty is not closed. > (1) can you check PPID of zombie login? Is it 1 or <telnetd's PID>? > Therefor the PPID should be telnetd. Also, if the PPID would be init that would mean that init is not working, which is unlikely. > (2) is it possible that you start telnetd so that it inherits > "ignore SIGCHLD" from the parent? Try adding this line > after signal(SIGPIPE, SIG_IGN): > > signal(SIGPIPE, SIG_IGN); > + signal(SIGCHLD, SIG_DFL); > SIGCHLD should already be SIG_DFL. Maybe you meant SIG_IGN, but ignoring SIGCHLD should change nothing except for the left of zombie process. > Unrelated note: I looked at telnetd source and tightened up > some loose ends. Can you test this patch? (I don't think > it will help with this particular problem, though...) > The only change in your patch which seems relevant to this problem is where you removed the SIGKILL before the wait(). I think this is dangerous in a single-threaded server. The child shell will probably eventually clean up and exit, but if that takes some seconds, all other connections will hang for that time. As I wrote earlier, the problem is that the shell exits but the client pty remains open. You can reproduce this as follows: telnet server # login ... # on the server: sleep 10 & exit The telnet client will only exit after sleep terminates. SIGCHLD is ignored (as in SIG_DFL) at the moment and the session will only be terminated when the client pty or the network connection is closed. Now the question is whether this should be considered a bug or a feature. If it is considered a bug, the solution is to remove the sigkill and the wait, install a signal handler for SIGCHLD, inside the signal handler find the tsession for the pid and close the session. Regards Ralf Friedl _______________________________________________ busybox mailing list [email protected] http://busybox.net/cgi-bin/mailman/listinfo/busybox
