I think I found two problems within the usbserial driver and a way to fix them.
First of all the problems:
I use the usbserial driver to synchronise my Pocket PC with my Linux system via SynCE 0.8.4 (http://synce.sourceforge.net/). Basicly everything seems to work, but sometimes I notice that it is not possible to list some directories / files on my Pocket PC. The according SynCE program pls hangs infinitely in a read system call (as proved by strace). Furthermore I observe a kernel message like the following (the number in the brackets varies):
May 10 22:37:02 euler kernel: usbserial.c: too much data (112)
If I disconnect my Pocket PC from the cradle station in this situation I get the following kernel messages and the system crashes:
May 10 22:37:52 euler kernel: usb-uhci.c: interrupt, status 2, frame# 1636 Oops: 0002 2.4.21-20040509-debug #7 Mon May 10 12:04:53 CEST 2004 CPU: 0 EIP: 0010:[<c322889e>] Not tainted EFLAGS: 00010002 eax: c322c474 ebx: 00000202 ecx: 00000000 edx: c326edc0 esi: c35c10a1 edi: c326eddd ebp: c362bf0c esp: c362bee8 ds: 0018 es: 0018 ss: 0018 Process pppd (pid: 1172, stackpage=c362b000) Stack: c1c0d270 00000020 c326edc0 c35c118e c326edc0 00000000 c35c1000 00000000 00000000 c362bf3c c362cc32 c2b6e000 00000000 c35c1094 0000000d c2b6e000 00000001 0000000d c35c1000 c24c15c0 c36a16c0 c362bf4c c362cb25 c35c1000 Call Trace: [<c362cc32>] (48) [<c362cb25>] (16) [<c35e980e>] (32) [<c35e8357>] (36) [<c0145f0c>] (44) [<c01079d3>] (60) Modules: [(ppp_async:<c362c060>:<c362e1c0>)] [(ppp_generic:<c35e8060>:<c35ece74>)] [(usbserial:<c3228060>:<c322d5fc>)] Code: ff 81 80 03 00 00 c7 04 24 e8 c5 22 c3 e8 a0 28 f0 fc 53 9d <0>Kernel panic: Aiee, killing interrupt handler! In interrupt handler - not syncing
An analysis of this OOPS with ksymoops delivers the following:
ksymoops 2.4.9 on i686 2.4.21-20040509-ruediger. Options used -V (default) -k ksyms (specified) -l modules (specified) -o /lib/modules/2.4.21-20040509-debug/ (specified) -m /boot/System.map-2.4.21-20040509-debug (specified)
Oops: 0002 2.4.21-20040509-debug #7 Mon May 10 12:04:53 CEST 2004 CPU: 0 EIP: 0010:[<c322889e>] Not tainted Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 00010002 eax: c322c474 ebx: 00000202 ecx: 00000000 edx: c326edc0 esi: c35c10a1 edi: c326eddd ebp: c362bf0c esp: c362bee8 ds: 0018 es: 0018 ss: 0018 Process pppd (pid: 1172, stackpage=c362b000) Stack: c1c0d270 00000020 c326edc0 c35c118e c326edc0 00000000 c35c1000 00000000 00000000 c362bf3c c362cc32 c2b6e000 00000000 c35c1094 0000000d c2b6e000 00000001 0000000d c35c1000 c24c15c0 c36a16c0 c362bf4c c362cb25 c35c1000 Call Trace: [<c362cc32>] (48) [<c362cb25>] (16) [<c35e980e>] (32) [<c35e8357>] (36) [<c0145f0c>] (44) [<c01079d3>] (60) Code: ff 81 80 03 00 00 c7 04 24 e8 c5 22 c3 e8 a0 28 f0 fc 53 9d
>>EIP; c322889e <[usbserial]serial_write+13e/1e0> <=====
>>eax; c322c474 <[usbserial]post_list+0/0> >>edx; c326edc0 <[lp].bss.end+23101/3783a1> >>esi; c35c10a1 <[lp].bss.end+3753e2/3783a1> >>edi; c326eddd <[lp].bss.end+2311e/3783a1> >>ebp; c362bf0c <[bsd_comp].data.end+aab5/ac09> >>esp; c362bee8 <[bsd_comp].data.end+aa91/ac09>
Trace; c362cc32 <[ppp_async]ppp_async_push+102/1a0> Trace; c362cb25 <[ppp_async]ppp_async_send+45/50> Trace; c35e980e <[ppp_generic]ppp_channel_push+6e/160> Trace; c35e8357 <[ppp_generic]ppp_write+f7/130> Trace; c0145f0c <sys_write+9c/130> Trace; c01079d3 <system_call+33/38>
Code; c322889e <[usbserial]serial_write+13e/1e0> 00000000 <_EIP>: Code; c322889e <[usbserial]serial_write+13e/1e0> <===== 0: ff 81 80 03 00 00 incl 0x380(%ecx) <===== Code; c32288a4 <[usbserial]serial_write+144/1e0> 6: c7 04 24 e8 c5 22 c3 movl $0xc322c5e8,(%esp,1) Code; c32288ab <[usbserial]serial_write+14b/1e0> d: e8 a0 28 f0 fc call fcf028b2 <_EIP+0xfcf028b2> c012b150 <schedule_task+0/70> Code; c32288b0 <[usbserial]serial_write+150/1e0> 12: 53 push %ebx Code; c32288b1 <[usbserial]serial_write+151/1e0> 13: 9d popf
<0>Kernel panic: Aiee, killing interrupt handler!
Next I had a look at serial_write in usbserial.c. For a better discussion I add the interesting lines to my report:
712: job->port = port; 713: if ((job->len = count) >= POST_BSIZE) { 714: static int rate = 0; 715: /* 716: * Data loss due to extreme circumstances. 717: * It's a ususal thing on serial to lose characters, isn't it? 718: * Neener, neener! Actually, it's probably an echo loop anyway. 719: * Only happens when getty starts talking to Visor. 720: */ 721: if (++rate % 1000 < 5) 722: err("too much data (%d)", count); 723: job->len = POST_BSIZE; 724: } 725: memcpy(job->buff, buf, job->len); 726: 727: spin_lock_irqsave(&post_lock, flags); 728: list_add_tail(&job->link, &post_list); 729: serial->ref++; /* Protect the port->sem from kfree() */ 730: schedule_task(&post_task); 731: spin_unlock_irqrestore(&post_lock, flags); 732: 733: return count; 734:}
The kernel panic is caused by line 729, because serial is zero at this point of time (as proved by the value of the ecx register). So we have a NULL pointer dereference here. To solve this problem I patched serial_write like the following:
--- drivers/usb/serial/usbserial.c.orig 2004-04-26 18:27:43.000000000 +0200 +++ drivers/usb/serial/usbserial.c 2004-05-10 22:02:32.000000000 +0200 @@ -705,6 +705,17 @@ return -EINVAL; }
+ if (!serial) { + static int rate_serial = 0; + /* + * Check if serial is NULL, because this could lead + * to a kernel panic. + */ + if (++rate_serial % 1000 < 5) + err("usb_serial structure is NULL"); + return -EINVAL; + } + job = kmalloc(sizeof(struct usb_serial_post_job), GFP_ATOMIC); if (job == NULL) return -ENOMEM;
After I applied this patch the kernel panic did not appear again. But as I am not a kernel hacker the following questions remain:
1. Is my code at the correct position. 2. Is -EINVAL the correct return code in this situation.
After I solved this problem I checked for the second problem (the hanging pls process). I noticed that serial_write always returns count if it is called in an interrupt and everything else goes well. From my basic understanding I think that the return value of serial_write should be the number of bytes really written. But in the situation of my hanging pls process serial_write does not process count bytes, but only POST_BSIZE bytes as proved by the kernel message:
May 10 22:37:02 euler kernel: usbserial.c: too much data (112)
So I patched serial_write in the following way:
--- drivers/usb/serial/usbserial.c.orig 2004-04-26 18:27:43.000000000 +0200 +++ drivers/usb/serial/usbserial.c 2004-05-10 22:02:32.000000000 +0200 @@ -721,6 +732,9 @@ if (++rate % 1000 < 5) err("too much data (%d)", count); job->len = POST_BSIZE; + count = job->len; /* Set count to job->len to give the correct + return value of data really written */ + } memcpy(job->buff, buf, job->len);
After I applied this patch everything works fine now (no hanging pls process, no kernel panics). I still get the following kernel message for some pls calls, but the results are correct:
May 10 22:37:02 euler kernel: usbserial.c: too much data (112)
As I do not have an in depth view in the usbserial driver the remaining question is:
Is this correct or does this break something else?
I hope my report is helpful.
Best regards
Rüdiger Plüm
--- drivers/usb/serial/usbserial.c.orig 2004-04-26 18:27:43.000000000 +0200 +++ drivers/usb/serial/usbserial.c 2004-05-10 22:02:32.000000000 +0200 @@ -705,6 +705,17 @@ return -EINVAL; } + if (!serial) { + static int rate_serial = 0; + /* + * Check if serial is NULL, because this could lead + * to a kernel panic. + */ + if (++rate_serial % 1000 < 5) + err("usb_serial structure is NULL"); + return -EINVAL; + } + job = kmalloc(sizeof(struct usb_serial_post_job), GFP_ATOMIC); if (job == NULL) return -ENOMEM; @@ -721,6 +732,9 @@ if (++rate % 1000 < 5) err("too much data (%d)", count); job->len = POST_BSIZE; + count = job->len; /* Set count to job->len to give the correct + return value of data really written */ + } memcpy(job->buff, buf, job->len);