Date: Wed, 30 Sep 2015 18:29:20 +0800 (PHT) From: Paul Goyette <p...@vps1.whooppee.com> Message-ID: <pine.neb.4.64.1509301818410.22...@vps1.whooppee.com>
| Well, a quick read through sbin/init.c shows that sometimes it waits | with WNOHANG and sometimes it doesn't. It is more that init reaps lots of zombie processes, missing just one of them, occasionally, seems unlikely at best, whatever flags it gives wait(). Far more likely (IMO) is that the process in question is special somehow, and the most likely special that would cause wait() to fail to see it, is if the process isn't on init's child process list. There might be other possibilities, if the kernel wait code sometimes ignores zombie processes for some other reason (some other resource still owned, or whatever). | Well, for the previous occurrence, I waited many hours, and the zombie | was still there. (It might even have been as much as a couple of days.) Of course, it won't be time based where your shutdown just happened to occur at the magic interval ... rather, shutdown will be causing some other condition to occur (or be removed) which then allows the zombie process to complete its transition into full zombiehood (???) and for init to then clean it. | If I get really brave, I might even use gdb to attach to init(8) and see | which of the several waitpid() calls is active. I think I'd start with the proc structure of the zombie itself, and see if there's anything unusual about it, see if all the processes resources (like its kernel stack) have truly been freed already, and if not, just where that process is sitting. Since the zombie sits there essentially forever (it seems) it ought to be fairly easy to check this just using gdb on /dev/kmem without interrupting normal operations at all (ie: risk free). On the other hand, checking init's child queue that way would be hard, as it is in a constant state of churn. kre