I think I found two problems within the usbserial driver and a way to fix them.
First of all the problems:
I use the usbserial driver to synchronise my Pocket PC with my Linux system via SynCE 0.8.4 (http://synce.sourceforge.net/). Basicly everything seems to work, but sometimes I notice that it is not possible to list some directories / files on my Pocket PC. The according SynCE program pls hangs infinitely in a read system call (as proved by strace). Furthermore I observe a kernel message like the following (the number in the brackets varies):
May 10 22:37:02 euler kernel: usbserial.c: too much data (112)
If I disconnect my Pocket PC from the cradle station in this situation I get the following kernel messages and the system crashes:
May 10 22:37:52 euler kernel: usb-uhci.c: interrupt, status 2, frame# 1636
Oops: 0002 2.4.21-20040509-debug #7 Mon May 10 12:04:53 CEST 2004
CPU: 0
EIP: 0010:[<c322889e>] Not tainted
EFLAGS: 00010002
eax: c322c474 ebx: 00000202 ecx: 00000000 edx: c326edc0
esi: c35c10a1 edi: c326eddd ebp: c362bf0c esp: c362bee8
ds: 0018 es: 0018 ss: 0018
Process pppd (pid: 1172, stackpage=c362b000)
Stack: c1c0d270 00000020 c326edc0 c35c118e c326edc0 00000000 c35c1000 00000000
00000000 c362bf3c c362cc32 c2b6e000 00000000 c35c1094 0000000d c2b6e000
00000001 0000000d c35c1000 c24c15c0 c36a16c0 c362bf4c c362cb25 c35c1000
Call Trace: [<c362cc32>] (48) [<c362cb25>] (16) [<c35e980e>] (32)
[<c35e8357>] (36) [<c0145f0c>] (44) [<c01079d3>] (60)
Modules: [(ppp_async:<c362c060>:<c362e1c0>)] [(ppp_generic:<c35e8060>:<c35ece74>)]
[(usbserial:<c3228060>:<c322d5fc>)]
Code: ff 81 80 03 00 00 c7 04 24 e8 c5 22 c3 e8 a0 28 f0 fc 53 9d
<0>Kernel panic: Aiee, killing interrupt handler!
In interrupt handler - not syncingAn analysis of this OOPS with ksymoops delivers the following:
ksymoops 2.4.9 on i686 2.4.21-20040509-ruediger. Options used
-V (default)
-k ksyms (specified)
-l modules (specified)
-o /lib/modules/2.4.21-20040509-debug/ (specified)
-m /boot/System.map-2.4.21-20040509-debug (specified)Oops: 0002 2.4.21-20040509-debug #7 Mon May 10 12:04:53 CEST 2004
CPU: 0
EIP: 0010:[<c322889e>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010002
eax: c322c474 ebx: 00000202 ecx: 00000000 edx: c326edc0
esi: c35c10a1 edi: c326eddd ebp: c362bf0c esp: c362bee8
ds: 0018 es: 0018 ss: 0018
Process pppd (pid: 1172, stackpage=c362b000)
Stack: c1c0d270 00000020 c326edc0 c35c118e c326edc0 00000000 c35c1000 00000000
00000000 c362bf3c c362cc32 c2b6e000 00000000 c35c1094 0000000d c2b6e000
00000001 0000000d c35c1000 c24c15c0 c36a16c0 c362bf4c c362cb25 c35c1000
Call Trace: [<c362cc32>] (48) [<c362cb25>] (16) [<c35e980e>] (32)
[<c35e8357>] (36) [<c0145f0c>] (44) [<c01079d3>] (60)
Code: ff 81 80 03 00 00 c7 04 24 e8 c5 22 c3 e8 a0 28 f0 fc 53 9d>>EIP; c322889e <[usbserial]serial_write+13e/1e0> <=====
>>eax; c322c474 <[usbserial]post_list+0/0> >>edx; c326edc0 <[lp].bss.end+23101/3783a1> >>esi; c35c10a1 <[lp].bss.end+3753e2/3783a1> >>edi; c326eddd <[lp].bss.end+2311e/3783a1> >>ebp; c362bf0c <[bsd_comp].data.end+aab5/ac09> >>esp; c362bee8 <[bsd_comp].data.end+aa91/ac09>
Trace; c362cc32 <[ppp_async]ppp_async_push+102/1a0> Trace; c362cb25 <[ppp_async]ppp_async_send+45/50> Trace; c35e980e <[ppp_generic]ppp_channel_push+6e/160> Trace; c35e8357 <[ppp_generic]ppp_write+f7/130> Trace; c0145f0c <sys_write+9c/130> Trace; c01079d3 <system_call+33/38>
Code; c322889e <[usbserial]serial_write+13e/1e0> 00000000 <_EIP>: Code; c322889e <[usbserial]serial_write+13e/1e0> <===== 0: ff 81 80 03 00 00 incl 0x380(%ecx) <===== Code; c32288a4 <[usbserial]serial_write+144/1e0> 6: c7 04 24 e8 c5 22 c3 movl $0xc322c5e8,(%esp,1) Code; c32288ab <[usbserial]serial_write+14b/1e0> d: e8 a0 28 f0 fc call fcf028b2 <_EIP+0xfcf028b2> c012b150 <schedule_task+0/70> Code; c32288b0 <[usbserial]serial_write+150/1e0> 12: 53 push %ebx Code; c32288b1 <[usbserial]serial_write+151/1e0> 13: 9d popf
<0>Kernel panic: Aiee, killing interrupt handler!
Next I had a look at serial_write in usbserial.c. For a better discussion I add the interesting lines to my report:
712: job->port = port;
713: if ((job->len = count) >= POST_BSIZE) {
714: static int rate = 0;
715: /*
716: * Data loss due to extreme circumstances.
717: * It's a ususal thing on serial to lose characters, isn't it?
718: * Neener, neener! Actually, it's probably an echo loop anyway.
719: * Only happens when getty starts talking to Visor.
720: */
721: if (++rate % 1000 < 5)
722: err("too much data (%d)", count);
723: job->len = POST_BSIZE;
724: }
725: memcpy(job->buff, buf, job->len);
726:
727: spin_lock_irqsave(&post_lock, flags);
728: list_add_tail(&job->link, &post_list);
729: serial->ref++; /* Protect the port->sem from kfree() */
730: schedule_task(&post_task);
731: spin_unlock_irqrestore(&post_lock, flags);
732:
733: return count;
734:}The kernel panic is caused by line 729, because serial is zero at this point of time (as proved by the value of the ecx register). So we have a NULL pointer dereference here. To solve this problem I patched serial_write like the following:
--- drivers/usb/serial/usbserial.c.orig 2004-04-26 18:27:43.000000000 +0200
+++ drivers/usb/serial/usbserial.c 2004-05-10 22:02:32.000000000 +0200
@@ -705,6 +705,17 @@
return -EINVAL;
}+ if (!serial) {
+ static int rate_serial = 0;
+ /*
+ * Check if serial is NULL, because this could lead
+ * to a kernel panic.
+ */
+ if (++rate_serial % 1000 < 5)
+ err("usb_serial structure is NULL");
+ return -EINVAL;
+ }
+
job = kmalloc(sizeof(struct usb_serial_post_job), GFP_ATOMIC);
if (job == NULL)
return -ENOMEM;After I applied this patch the kernel panic did not appear again. But as I am not a kernel hacker the following questions remain:
1. Is my code at the correct position. 2. Is -EINVAL the correct return code in this situation.
After I solved this problem I checked for the second problem (the hanging pls process). I noticed that serial_write always returns count if it is called in an interrupt and everything else goes well. From my basic understanding I think that the return value of serial_write should be the number of bytes really written. But in the situation of my hanging pls process serial_write does not process count bytes, but only POST_BSIZE bytes as proved by the kernel message:
May 10 22:37:02 euler kernel: usbserial.c: too much data (112)
So I patched serial_write in the following way:
--- drivers/usb/serial/usbserial.c.orig 2004-04-26 18:27:43.000000000 +0200
+++ drivers/usb/serial/usbserial.c 2004-05-10 22:02:32.000000000 +0200
@@ -721,6 +732,9 @@
if (++rate % 1000 < 5)
err("too much data (%d)", count);
job->len = POST_BSIZE;
+ count = job->len; /* Set count to job->len to give the correct
+ return value of data really written */
+
}
memcpy(job->buff, buf, job->len);
After I applied this patch everything works fine now (no hanging pls process, no kernel panics). I still get the following kernel message for some pls calls, but the results are correct:
May 10 22:37:02 euler kernel: usbserial.c: too much data (112)
As I do not have an in depth view in the usbserial driver the remaining question is:
Is this correct or does this break something else?
I hope my report is helpful.
Best regards
R�diger Pl�m
--- drivers/usb/serial/usbserial.c.orig 2004-04-26 18:27:43.000000000 +0200
+++ drivers/usb/serial/usbserial.c 2004-05-10 22:02:32.000000000 +0200
@@ -705,6 +705,17 @@
return -EINVAL;
}
+ if (!serial) {
+ static int rate_serial = 0;
+ /*
+ * Check if serial is NULL, because this could lead
+ * to a kernel panic.
+ */
+ if (++rate_serial % 1000 < 5)
+ err("usb_serial structure is NULL");
+ return -EINVAL;
+ }
+
job = kmalloc(sizeof(struct usb_serial_post_job), GFP_ATOMIC);
if (job == NULL)
return -ENOMEM;
@@ -721,6 +732,9 @@
if (++rate % 1000 < 5)
err("too much data (%d)", count);
job->len = POST_BSIZE;
+ count = job->len; /* Set count to job->len to give the correct
+ return value of data really written */
+
}
memcpy(job->buff, buf, job->len);
