Followup with more debug output: On Mon, Jul 28, 2003 at 09:49:34PM +0200, Florian Lohoff wrote: > Hi, > i am still having problems after a couple of suspend/resume cycles > (swsusp) to unload the usbcore module. > > I am stopping the whole usb subsystem (/etc/init.d/hotplug stop) before > suspend and restart it afterwards. After a random amount of > suspend/resume cycles the stop hangs on unloading the usbcore module. > > SW: Kernel 2.4.21 + swsusp 1.0.3 + acpi backport, Debian/Sarge > HW: Sony Vaio PCG-C1MHP, OHCI Acer Labs
It bugged me that on module unload the usbdevfs unload was the last
successfull step - So it was clear it hung on killing the khubd. Than it
bugged me that the khubd was already gone:
> F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
> 100 S 0 1 0 0 68 0 - 304 select ? 00:00:04 init
> 040 S 0 2 1 0 69 0 - 0 contex ? 00:00:00 keventd
> 040 S 0 3 1 0 79 19 - 0 ksofti ? 00:00:00 ksoftirqd_CPU0
> 040 S 0 4 1 0 69 0 - 0 kswapd ? 00:00:00 kswapd
> 040 S 0 5 1 0 69 0 - 0 bdflus ? 00:00:00 bdflush
> 040 S 0 6 1 0 69 0 - 0 kupdat ? 00:00:00 kupdated
> 040 S 0 9 1 4 69 0 - 0 swsusp ? 00:00:23 kswsuspd
> 040 S 0 10 1 0 69 0 - 0 kjourn ? 00:00:00 kjournald
> 040 S 0 46 1 0 69 0 - 0 down_i ? 00:00:00 knodemgrd_0
> 140 S 0 290 1 0 69 0 - 322 select ? 00:00:00 syslogd
> 140 S 0 384 1 0 69 0 - 510 syslog ? 00:00:00 klogd
[...]
Now i applied this patch which is in any case correct as we don't find
the khubd we wont get any completion event anyway:
--- drivers/usb/hub.c.orig Mon Jul 28 22:04:03 2003
+++ drivers/usb/hub.c Mon Jul 28 22:28:00 2003
@@ -982,7 +982,10 @@
/* Kill the thread */
ret = kill_proc(khubd_pid, SIGTERM, 1);
- wait_for_completion(&khubd_exited);
+ if (ret == -ESRCH)
+ printk(KERN_ERR "usbcore: oops - khubd already died\n");
+ else
+ wait_for_completion(&khubd_exited);
/*
* Hub resources are freed for us by usb_deregister. It calls
Now my KERN_ERR printk steps in as expected - the khubd seems to be gone
and i get a nice oops afterwards.
ksymoops 2.4.5 on i686 2.4.21-vaio3. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.21-vaio3/ (default)
-m /boot/System.map-2.4.21-vaio3 (default)
Warning: You did not tell me where to find symbol information. I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc. ksymoops -h explains the options.
Warning (compare_ksyms_lsmod): module usbcore is in lsmod but not in ksyms, probably
no symbols exported
4640k reserved, 461k data, 80k init, 0k highmem)
CPU: 20020207 23:55 official release 4.3.0#7
8139too Fast Ethernet driver 0.9.26
WARNING: USB Mass Storage data integrity not assured
cs: IO port probe 0x0c00-0x0cff: clean.
cs: IO port probe 0x0800-0x08ff: clean.
cs: IO port probe 0x0100-0x04ff: excluding 0x200-0x207 0x220-0x22f 0x330-0x337
0x388-0x38f 0x408-0x40f 0x480-0x48f 0x4d0-0x4d7
cs: IO port probe 0x0a00-0x0aff: clean.
Unable to handle kernel NULL pointer dereference at virtual address 00000004
d0067261
*pde = 00000000
Oops: 0002
CPU: 0
EIP: 0010:[<d0067261>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010246
eax: 00000000 ebx: d0067000 ecx: d0075310 edx: 00000000
esi: d0075300 edi: d0067000 ebp: bfffed88 esp: cb10bf7c
ds: 0018 es: 0018 ss: 0018
Process rmmod.modutils (pid: 1285, stackpage=cb10b000)
Stack: d0067000 fffffff0 d0067000 d006ba9c d0075300 d0069f63 c0114012 d0067000
fffffff0 cadc2000 bfffed88 c0113560 d0067000 00000000 cb10a000 00000001
bfffed88 c0106bcf bfffff06 bffffe24 bfffed88 00000001 bfffed88 bfffed88
Call Trace: [<d006ba9c>] [<d0075300>] [<d0069f63>] [<c0114012>] [<c0113560>]
[<c0106bcf>]
Code: 89 50 04 89 02 c7 46 10 00 00 00 00 c7 41 04 00 00 00 00 b9
>>EIP; d0067261 <[ohci1394].data.end+3d62/1bb01> <=====
>>ebx; d0067000 <[ohci1394].data.end+3b01/1bb01>
>>ecx; d0075310 <[ohci1394].data.end+11e11/1bb01>
>>esi; d0075300 <[ohci1394].data.end+11e01/1bb01>
>>edi; d0067000 <[ohci1394].data.end+3b01/1bb01>
>>ebp; bfffed88 Before first symbol
>>esp; cb10bf7c <_end+ae22eac/fd69f30>
Trace; d006ba9c <[ohci1394].data.end+859d/1bb01>
Trace; d0075300 <[ohci1394].data.end+11e01/1bb01>
Trace; d0069f63 <[ohci1394].data.end+6a64/1bb01>
Trace; c0114012 <free_module+17/95>
Trace; c0113560 <sys_delete_module+ef/198>
Trace; c0106bcf <tracesys+1f/23>
Code; d0067261 <[ohci1394].data.end+3d62/1bb01>
00000000 <_EIP>:
Code; d0067261 <[ohci1394].data.end+3d62/1bb01> <=====
0: 89 50 04 mov %edx,0x4(%eax) <=====
Code; d0067264 <[ohci1394].data.end+3d65/1bb01>
3: 89 02 mov %eax,(%edx)
Code; d0067266 <[ohci1394].data.end+3d67/1bb01>
5: c7 46 10 00 00 00 00 movl $0x0,0x10(%esi)
Code; d006726d <[ohci1394].data.end+3d6e/1bb01>
c: c7 41 04 00 00 00 00 movl $0x0,0x4(%ecx)
Code; d0067274 <[ohci1394].data.end+3d75/1bb01>
13: b9 00 00 00 00 mov $0x0,%ecx
2 warnings issued. Results may not be reliable.
It seems that i was unable to trigger the problem when simply running
for i in `seq 1 50`; do /etc/init.d/hotplug stop; sleep 2; /etc/init.d/hotplug start;
sleep 2; done
while the laptop was on AC power. It seems that when slowing down the
machine by going on battery and using strace on the scripts seems to
trigger the bug the on the second stop reliably.
Flo
--
Florian Lohoff [EMAIL PROTECTED] +49-171-2280134
Heisenberg may have been here.
pgp00000.pgp
Description: PGP signature
