Hi all. I've just finished tracking down a weird problem on our system.
I can fix the "top-level" issue, but I'm trying to determine whether
there are any real bugs here (besides my error) or not.
We're using an intel-based embedded system running in a ramdisk with a
2.6.32.13 kernel, glibc, and busybox 1.15.3. On this system we're using
busybox syslogd and klogd. We noticed that we were getting corrupted
kernel messages in the syslog file output. dmesg showed correct values
but the kernel messages kept in /var/log/messages were all messed up.
Tracking this down it appears the cause of the problem is that there are
two klogd processes running on my system. So, my first question is, is
this expected that if two klogd are running that they would "step on"
each others' access to the kernel log ring buffer, or something, and
generate corrupted output? I would have naively expected each process
to maintain its own pointer and I might get duplicated messages, but not
corrupted messages. If I run two "dmesg" commands at the same time I
don't see this kind of corruption. Or, is this a bug in busybox klogd?
Or is it a bug in Linux 2.6.32.13?
Trying to determine why I have two klogd processes I see this: the
bring-up of the system is managed by busybox init. To start the system
it runs:
::sysinit:/etc/init.d/rcS start
::once:/etc/rootfs start
In the "rootfs" script is where klogd is started, then it does some
things such as change the system time, timezone, hostname, etc. The way
things work we restart the logging system multiple times as these
changes are implemented, so the log messages have the right timezone and
hostname.
The first time I restart I kill klogd, but init (apparently) never reaps
the child so it's running as a zombie. Then I start a new klogd. Then
more booting happens and I want to restart it again, but when I use
pidof to find the pid of klogd I get back TWO pids: the zombie and the
real klogd. Due to an error in my script where I don't expect to get
back >1 PID, the kill fails and I start another klogd. Now I have one
zombie and two real klogd's, so that's where this corruption comes from.
At some later point (once my "rootfs" script ends?), init will reap the
zombie klogd and I'll be left with just the two klogd processes.
I can fix my scripting, of course, but that brings me to the next
question: is it expected/desirable that init doesn't reap children while
it's waiting for a command like this to complete? This seems like
sub-optimal behavior.
_______________________________________________
busybox mailing list
[email protected]
http://lists.busybox.net/mailman/listinfo/busybox