Package: klibc-utils
Version: 1.5.9-2
Severity: normal
Tags: patch

--- Please enter the report below this line. ---

Every now and then, we come across a machine which is unable to mount
the root filesystem for whatever reasons, and get stuck at the busybox
initrd environment, from which we can run dmesg to diagnostic what went
wrong.

To our dismay, in recent months (or years?), dmesg result come out like
this, with lots of missing numbers:

[    0.000] Linux version 2.6.2-2-66 (Debian 2.6.2-3) ([EMAIL PROTECTED]) (g
cc version 4.1.3 2002 (prerelease) (Debian 4.1.2-2)) #1 SMP Wed May 1 1:
4:0 UTC 20
[    0.000] BIOS-provided physical RAM map:
[    0.000]  BIOS-e80: 00000000 - 000000e00 (usable)
[    0.000]  BIOS-e80: 000000e00 - 000000a00 (reserved)

But it is supposed to look like this:

[    0.000000] Linux version 2.6.25-2-686 (Debian 2.6.25-3)
([EMAIL PROTECTED]) (g
cc version 4.1.3 20080420 (prerelease) (Debian 4.1.2-22)) #1 SMP Wed May
14 16:4
2:03 UTC 2008
[    0.000000] BIOS-provided physical RAM map:
[    0.000000]  BIOS-e820: 0000000000000000 - 000000000009e000 (usable)
[    0.000000]  BIOS-e820: 000000000009e000 - 00000000000a0000 (reserved)

This caused quite a bit of problem when we trying to diagnose kernel
oops or panics since the addresses are all wrong.

Initially, we thought it had something to do with memory corruption from
the kernel Oops.  But later, we noticed this phenomenon happens even for
cases without a kernel oops, say, perhaps we just got root=/dev/sda7
written wrong.

So, we decided to investigate, and eventually came to the realization
that the dmesg in initrd.img in Debian (and Ubuntu) nowadays come not
from busybox but klibc-utils, and running /usr/lib/klibc/bin/dmesg on a
fully booted system exhibit the same bug.

Checking the source code, we found the code used to strip out <[0-7]>
that prefixes every kernel message (See klogd(8)) is somewhat incorrect.
 So, with a bit of hacking, we got that fixed.  :-)  A patch is
attached.  Just drop it in debian/patches/20_dmesg_dropped-digits.patch
.  :-)

We have verified the output of this fixed dmesg identical to that of
util-linux dmesg.

Further thoughts:

We checked out klibc source using:
     git clone git://git.kernel.org/pub/scm/libs/klibc/klibc.git

And noticed it is an upstream bug since dmesg.c was first added on (Mon
Aug 20 19:57:50 2007 +0200) commit 9c5a7acda064daa7482148b5a45ee3b7ed39356c

As to why this bug wasn't discovered sooner... I don't know.  Perhaps
very few people use the tiny dmesg in klibc-utils for diagnostic
purposes?  And before that, Debian uses the dmesg module in busybox,
which exhibits no such bug?

Cheers,

Anthony Fok <anthony dot fok at thizgroup dot com>
ThizLinux Software Co., Ltd.
A member of Thiz Technology Group

--- System information. ---
Architecture: i386
Kernel:       Linux 2.6.25-2-686

Debian Release: lenny/sid
  500 unstable        debian.cn99.com
    1 experimental    debian.cn99.com

--- Package information. ---
Depends        (Version) | Installed
========================-+-============
libklibc     (= 1.5.9-2) | 1.5.9-2

diff -Nur -x '*.orig' -x '*~' klibc-1.5.9/usr/utils/dmesg.c klibc-1.5.9.new/usr/utils/dmesg.c
--- klibc-1.5.9/usr/utils/dmesg.c	2008-03-29 04:25:36.000000000 +0800
+++ klibc-1.5.9.new/usr/utils/dmesg.c	2008-05-27 07:07:36.000000000 +0800
@@ -50,20 +50,14 @@
 		exit(1);
 	}
 
-	while (buf[i] && i < len)
-		switch (buf[i]) {
-		case '<':
-			if (i == 0 || buf[i-1] == '\n')
-				i++;
-		case '0' ... '9':
-			if (i > 0 && buf[i-1] == '<')
-				i++;
-		case '>':
-			if (i > 0 && isdigit(buf[i-1]))
-				i++;
-		default:
-			putchar(buf[i++]);
-		}
+	while (buf[i] && i < len) {
+		if (i == 0 || buf[i-1] == '\n')
+			if (buf[i] == '<')
+				if (isdigit(buf[++i]))
+					if (buf[++i] == '>')
+						i++;
+		putchar(buf[i++]);
+	}
 
 	if (buf[i-1] != '\n')
 		putchar('\n');

Reply via email to