This is pretty nasty.
This is using the stock kernel from Mandrake 8.2. I expect that similar
problems exist in the cooker version.
What happens is that after some indeterminate time period, the system
does not allow you to start new processes. Already existing processes
run, but new processes will not start and you cannot restart new
processes. Shutting down cannot happen because you can't start the
shutdown script!
After looking through the logs, I think I have found the cause of the
problem. It appears that devfs is dying. It kills enough of the kernel
to not work correctly, but not enough of the kernel to choke all
together. (Enough to be frustrating.) It looks like the lethal
combination is a remountable ide-scsi device, but that is only a guess
at this point.
I believe the cause is one of the patches added to the kernel. (Probably
grsecurity.)
I rebuilt the kernel using the stock 2.4.18 source from ftp.kernel.org,
using the same configuration options needed to keep Mandrake happy.
(Devfs and ide-scsi mostly.) That kernel has worked flawlessly. (The
other kernel would not last more than a day.
There is definitely a problem here. What the solution is will take more
research.
ksymoops log is attached. Please Cc me on all mail, as I do not read the
cooker list very often. (Far too many other lists to keep up with...)
ksymoops 2.4.3 on i686 2.4.18-6mdk. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.18-6mdk/ (default)
-m /boot/System.map-2.4.18-6mdk (default)
Warning: You did not tell me where to find symbol information. I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc. ksymoops -h explains the options.
Warning (compare_ksyms_lsmod): module ext3 is in lsmod but not in ksyms, probably no
symbols exported
Warning (compare_maps): mismatch on symbol partition_name , ksyms_base says c01ce310,
System.map says c0157de0. Ignoring ksyms_base entry
Apr 7 14:31:39 kludge kernel: Unable to handle kernel paging request at virtual
address 204f2f8d
Apr 7 14:31:39 kludge kernel: c0160783
Apr 7 14:31:39 kludge kernel: *pde = 00000000
Apr 7 14:31:39 kludge kernel: Oops: 0000
Apr 7 14:31:39 kludge kernel: CPU: 0
Apr 7 14:31:39 kludge kernel: EIP: 0010:[scan_dir_for_removable+19/64] Not
tainted
Apr 7 14:31:39 kludge kernel: EIP: 0010:[<c0160783>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
Apr 7 14:31:39 kludge kernel: EFLAGS: 00010202
Apr 7 14:31:39 kludge kernel: eax: cc181240 ebx: 204f2f49 ecx: 00000000 edx:
cc181240
Apr 7 14:31:39 kludge kernel: esi: ce153840 edi: ce3647a0 ebp: ce5f32e0 esp:
c8955f28
Apr 7 14:31:39 kludge kernel: ds: 0018 es: 0018 ss: 0018
Apr 7 14:31:39 kludge kernel: Process msec_find (pid: 2952, stackpage=c8955000)
Apr 7 14:31:39 kludge kernel: Stack: ce153840 c0160c16 ce3647a0 c0265a40 00000000
ce153840 ce1538c0 ce1538ac
Apr 7 14:31:39 kludge kernel: ce5f32e0 c0141690 ce5f32e0 c8955fa0 c0141b90
ce5f32e0 fffffff7 0000000d
Apr 7 14:31:39 kludge kernel: bfffeac8 c0141d3f ce5f32e0 c0141b90 c8955fa0
ce02dbc0 c01338f7 ce02dbc0
Apr 7 14:31:39 kludge kernel: Call Trace: [devfs_readdir+86/448] [vfs_readdir+96/144]
[filldir64+0/352] [sys_getdents64+79/185] [filldir64+0/352]
Apr 7 14:31:39 kludge kernel: Call Trace: [<c0160c16>] [<c0141690>] [<c0141b90>]
[<c0141d3f>] [<c0141b90>]
Apr 7 14:31:39 kludge kernel: [<c01338f7>] [<c0106f23>]
Apr 7 14:31:39 kludge kernel: Code: 66 8b 43 44 25 00 f0 00 00 66 3d 00 60 75 0d f6
43 10 04 74
>>EIP; c0160782 <scan_dir_for_removable+12/40> <=====
Trace; c0160c16 <devfs_readdir+56/1c0>
Trace; c0141690 <vfs_readdir+60/90>
Trace; c0141b90 <filldir64+0/160>
Trace; c0141d3e <sys_getdents64+4e/b8>
Trace; c0141b90 <filldir64+0/160>
Trace; c01338f6 <sys_fchdir+c6/e0>
Trace; c0106f22 <system_call+32/40>
Code; c0160782 <scan_dir_for_removable+12/40>
00000000 <_EIP>:
Code; c0160782 <scan_dir_for_removable+12/40> <=====
0: 66 8b 43 44 mov 0x44(%ebx),%ax <=====
Code; c0160786 <scan_dir_for_removable+16/40>
4: 25 00 f0 00 00 and $0xf000,%eax
Code; c016078a <scan_dir_for_removable+1a/40>
9: 66 3d 00 60 cmp $0x6000,%ax
Code; c016078e <scan_dir_for_removable+1e/40>
d: 75 0d jne 1c <_EIP+0x1c> c016079e
<scan_dir_for_removable+2e/40>
Code; c0160790 <scan_dir_for_removable+20/40>
f: f6 43 10 04 testb $0x4,0x10(%ebx)
Code; c0160794 <scan_dir_for_removable+24/40>
13: 74 00 je 15 <_EIP+0x15> c0160796
<scan_dir_for_removable+26/40>
Apr 7 14:39:49 kludge kernel: 8139too Fast Ethernet driver 0.9.24
Apr 7 14:39:57 kludge kernel: ac97_codec: AC97 Audio codec, id: 0x8384:0x7609
(SigmaTel STAC9721/23)
Apr 8 04:05:56 kludge kernel: Unable to handle kernel paging request at virtual
address 204f2f8d
Apr 8 04:05:56 kludge kernel: c0160783
Apr 8 04:05:56 kludge kernel: *pde = 00000000
Apr 8 04:05:56 kludge kernel: Oops: 0000
Apr 8 04:05:56 kludge kernel: CPU: 0
Apr 8 04:05:56 kludge kernel: EIP: 0010:[scan_dir_for_removable+19/64] Not
tainted
Apr 8 04:05:56 kludge kernel: EIP: 0010:[<c0160783>] Not tainted
Apr 8 04:05:56 kludge kernel: EFLAGS: 00010202
Apr 8 04:05:56 kludge kernel: eax: cd855da0 ebx: 204f2f49 ecx: 00000000 edx:
cd855da0
Apr 8 04:05:56 kludge kernel: esi: c40635c0 edi: ce8807a0 ebp: c81152c0 esp:
c8671f28
Apr 8 04:05:56 kludge kernel: ds: 0018 es: 0018 ss: 0018
Apr 8 04:05:56 kludge kernel: Process msec_find (pid: 9575, stackpage=c8671000)
Apr 8 04:05:56 kludge kernel: Stack: c40635c0 c0160c16 ce8807a0 c0265a40 00000000
c40635c0 c4063640 c406362c
Apr 8 04:05:56 kludge kernel: c81152c0 c0141690 c81152c0 c8671fa0 c0141b90
c81152c0 fffffff7 0000000d
Apr 8 04:05:56 kludge kernel: bfffece8 c0141d3f c81152c0 c0141b90 c8671fa0
c9e600a0 c01338f7 c9e600a0
Apr 8 04:05:56 kludge kernel: Call Trace: [devfs_readdir+86/448] [vfs_readdir+96/144]
[filldir64+0/352] [sys_getdents64+79/185] [filldir64+0/352]
Apr 8 04:05:56 kludge kernel: Call Trace: [<c0160c16>] [<c0141690>] [<c0141b90>]
[<c0141d3f>] [<c0141b90>]
Apr 8 04:05:56 kludge kernel: [<c01338f7>] [<c0106f23>]
Apr 8 04:05:56 kludge kernel: Code: 66 8b 43 44 25 00 f0 00 00 66 3d 00 60 75 0d f6
43 10 04 74
>>EIP; c0160782 <scan_dir_for_removable+12/40> <=====
Trace; c0160c16 <devfs_readdir+56/1c0>
Trace; c0141690 <vfs_readdir+60/90>
Trace; c0141b90 <filldir64+0/160>
Trace; c0141d3e <sys_getdents64+4e/b8>
Trace; c0141b90 <filldir64+0/160>
Trace; c01338f6 <sys_fchdir+c6/e0>
Trace; c0106f22 <system_call+32/40>
Code; c0160782 <scan_dir_for_removable+12/40>
00000000 <_EIP>:
Code; c0160782 <scan_dir_for_removable+12/40> <=====
0: 66 8b 43 44 mov 0x44(%ebx),%ax <=====
Code; c0160786 <scan_dir_for_removable+16/40>
4: 25 00 f0 00 00 and $0xf000,%eax
Code; c016078a <scan_dir_for_removable+1a/40>
9: 66 3d 00 60 cmp $0x6000,%ax
Code; c016078e <scan_dir_for_removable+1e/40>
d: 75 0d jne 1c <_EIP+0x1c> c016079e
<scan_dir_for_removable+2e/40>
Code; c0160790 <scan_dir_for_removable+20/40>
f: f6 43 10 04 testb $0x4,0x10(%ebx)
Code; c0160794 <scan_dir_for_removable+24/40>
13: 74 00 je 15 <_EIP+0x15> c0160796
<scan_dir_for_removable+26/40>
Apr 8 10:37:11 kludge kernel: 8139too Fast Ethernet driver 0.9.24
Apr 8 10:37:19 kludge kernel: ac97_codec: AC97 Audio codec, id: 0x8384:0x7609
(SigmaTel STAC9721/23)
Apr 8 11:03:53 kludge kernel: 8139too Fast Ethernet driver 0.9.24
Apr 8 11:03:59 kludge kernel: ac97_codec: AC97 Audio codec, id: 0x8384:0x7609
(SigmaTel STAC9721/23)
Apr 8 14:54:09 kludge kernel: Unable to handle kernel paging request at virtual
address 204f2f8d
Apr 8 14:54:09 kludge kernel: c0160783
Apr 8 14:54:09 kludge kernel: *pde = 00000000
Apr 8 14:54:09 kludge kernel: Oops: 0000
Apr 8 14:54:09 kludge kernel: CPU: 0
Apr 8 14:54:09 kludge kernel: EIP: 0010:[scan_dir_for_removable+19/64] Not
tainted
Apr 8 14:54:09 kludge kernel: EIP: 0010:[<c0160783>] Not tainted
Apr 8 14:54:09 kludge kernel: EFLAGS: 00010202
Apr 8 14:54:09 kludge kernel: eax: ceaa41e0 ebx: 204f2f49 ecx: 00000000 edx:
ceaa41e0
Apr 8 14:54:09 kludge kernel: esi: cf47b040 edi: cf8496a0 ebp: c9a87ca0 esp:
c248bf28
Apr 8 14:54:09 kludge kernel: ds: 0018 es: 0018 ss: 0018
Apr 8 14:54:09 kludge kernel: Process find (pid: 11002, stackpage=c248b000)
Apr 8 14:54:09 kludge kernel: Stack: cf47b040 c0160c16 cf8496a0 c0265a40 00000000
cf47b040 cf47b0c0 cf47b0ac
Apr 8 14:54:09 kludge kernel: c9a87ca0 c0141690 c9a87ca0 c248bfa0 c0141b90
c9a87ca0 fffffff7 00000004
Apr 8 14:54:09 kludge kernel: bfffead8 c0141d3f c9a87ca0 c0141b90 c248bfa0
0000057d c1406360 41ed0007
Apr 8 14:54:09 kludge kernel: Call Trace: [devfs_readdir+86/448] [vfs_readdir+96/144]
[filldir64+0/352] [sys_getdents64+79/185] [filldir64+0/352]
Apr 8 14:54:09 kludge kernel: Call Trace: [<c0160c16>] [<c0141690>] [<c0141b90>]
[<c0141d3f>] [<c0141b90>]
Apr 8 14:54:09 kludge kernel: [<c0130001>] [<c0106f23>]
Apr 8 14:54:09 kludge kernel: Code: 66 8b 43 44 25 00 f0 00 00 66 3d 00 60 75 0d f6
43 10 04 74
>>EIP; c0160782 <scan_dir_for_removable+12/40> <=====
Trace; c0160c16 <devfs_readdir+56/1c0>
Trace; c0141690 <vfs_readdir+60/90>
Trace; c0141b90 <filldir64+0/160>
Trace; c0141d3e <sys_getdents64+4e/b8>
Trace; c0141b90 <filldir64+0/160>
Trace; c0130000 <sys_swapoff+170/280>
Trace; c0106f22 <system_call+32/40>
Code; c0160782 <scan_dir_for_removable+12/40>
00000000 <_EIP>:
Code; c0160782 <scan_dir_for_removable+12/40> <=====
0: 66 8b 43 44 mov 0x44(%ebx),%ax <=====
Code; c0160786 <scan_dir_for_removable+16/40>
4: 25 00 f0 00 00 and $0xf000,%eax
Code; c016078a <scan_dir_for_removable+1a/40>
9: 66 3d 00 60 cmp $0x6000,%ax
Code; c016078e <scan_dir_for_removable+1e/40>
d: 75 0d jne 1c <_EIP+0x1c> c016079e
<scan_dir_for_removable+2e/40>
Code; c0160790 <scan_dir_for_removable+20/40>
f: f6 43 10 04 testb $0x4,0x10(%ebx)
Code; c0160794 <scan_dir_for_removable+24/40>
13: 74 00 je 15 <_EIP+0x15> c0160796
<scan_dir_for_removable+26/40>
Apr 8 16:52:11 kludge kernel: 8139too Fast Ethernet driver 0.9.24
Apr 8 16:52:17 kludge kernel: ac97_codec: AC97 Audio codec, id: 0x8384:0x7609
(SigmaTel STAC9721/23)
3 warnings issued. Results may not be reliable.