Dear all,

I think I found two problems within the usbserial driver and a way
to fix them.

First of all the problems:

I use the usbserial driver to synchronise my Pocket PC with my Linux
system via SynCE 0.8.4 (http://synce.sourceforge.net/). Basicly everything
seems to work, but sometimes I notice that it is not possible to list
some directories / files on my Pocket PC. The according SynCE program
pls hangs infinitely in a read system call (as proved by strace). Furthermore
I observe a kernel message like the following (the number in the brackets
varies):

May 10 22:37:02 euler kernel: usbserial.c: too much data (112)

If I disconnect my Pocket PC from the cradle station in this situation I get
the following kernel messages and the system crashes:

May 10 22:37:52 euler kernel: usb-uhci.c: interrupt, status 2, frame# 1636
Oops: 0002 2.4.21-20040509-debug #7 Mon May 10 12:04:53 CEST 2004
CPU:    0
EIP:    0010:[<c322889e>]    Not tainted
EFLAGS: 00010002
eax: c322c474   ebx: 00000202   ecx: 00000000   edx: c326edc0
esi: c35c10a1   edi: c326eddd   ebp: c362bf0c   esp: c362bee8
ds: 0018   es: 0018   ss: 0018
Process pppd (pid: 1172, stackpage=c362b000)
Stack: c1c0d270 00000020 c326edc0 c35c118e c326edc0 00000000 c35c1000 00000000
       00000000 c362bf3c c362cc32 c2b6e000 00000000 c35c1094 0000000d c2b6e000
       00000001 0000000d c35c1000 c24c15c0 c36a16c0 c362bf4c c362cb25 c35c1000
Call Trace:         [<c362cc32>] (48) [<c362cb25>] (16) [<c35e980e>] (32)
  [<c35e8357>] (36) [<c0145f0c>] (44) [<c01079d3>] (60)
Modules: [(ppp_async:<c362c060>:<c362e1c0>)] [(ppp_generic:<c35e8060>:<c35ece74>)]
         [(usbserial:<c3228060>:<c322d5fc>)]
Code: ff 81 80 03 00 00 c7 04 24 e8 c5 22 c3 e8 a0 28 f0 fc 53 9d
 <0>Kernel panic: Aiee, killing interrupt handler!
In interrupt handler - not syncing

An analysis of this OOPS with ksymoops delivers the following:

ksymoops 2.4.9 on i686 2.4.21-20040509-ruediger.  Options used
     -V (default)
     -k ksyms (specified)
     -l modules (specified)
     -o /lib/modules/2.4.21-20040509-debug/ (specified)
     -m /boot/System.map-2.4.21-20040509-debug (specified)

Oops: 0002 2.4.21-20040509-debug #7 Mon May 10 12:04:53 CEST 2004
CPU:    0
EIP:    0010:[<c322889e>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010002
eax: c322c474   ebx: 00000202   ecx: 00000000   edx: c326edc0
esi: c35c10a1   edi: c326eddd   ebp: c362bf0c   esp: c362bee8
ds: 0018   es: 0018   ss: 0018
Process pppd (pid: 1172, stackpage=c362b000)
Stack: c1c0d270 00000020 c326edc0 c35c118e c326edc0 00000000 c35c1000 00000000
       00000000 c362bf3c c362cc32 c2b6e000 00000000 c35c1094 0000000d c2b6e000
       00000001 0000000d c35c1000 c24c15c0 c36a16c0 c362bf4c c362cb25 c35c1000
Call Trace:         [<c362cc32>] (48) [<c362cb25>] (16) [<c35e980e>] (32)
  [<c35e8357>] (36) [<c0145f0c>] (44) [<c01079d3>] (60)
Code: ff 81 80 03 00 00 c7 04 24 e8 c5 22 c3 e8 a0 28 f0 fc 53 9d


>>EIP; c322889e <[usbserial]serial_write+13e/1e0> <=====


>>eax; c322c474 <[usbserial]post_list+0/0>
>>edx; c326edc0 <[lp].bss.end+23101/3783a1>
>>esi; c35c10a1 <[lp].bss.end+3753e2/3783a1>
>>edi; c326eddd <[lp].bss.end+2311e/3783a1>
>>ebp; c362bf0c <[bsd_comp].data.end+aab5/ac09>
>>esp; c362bee8 <[bsd_comp].data.end+aa91/ac09>

Trace; c362cc32 <[ppp_async]ppp_async_push+102/1a0>
Trace; c362cb25 <[ppp_async]ppp_async_send+45/50>
Trace; c35e980e <[ppp_generic]ppp_channel_push+6e/160>
Trace; c35e8357 <[ppp_generic]ppp_write+f7/130>
Trace; c0145f0c <sys_write+9c/130>
Trace; c01079d3 <system_call+33/38>

Code;  c322889e <[usbserial]serial_write+13e/1e0>
00000000 <_EIP>:
Code;  c322889e <[usbserial]serial_write+13e/1e0>   <=====
   0:   ff 81 80 03 00 00         incl   0x380(%ecx)   <=====
Code;  c32288a4 <[usbserial]serial_write+144/1e0>
   6:   c7 04 24 e8 c5 22 c3      movl   $0xc322c5e8,(%esp,1)
Code;  c32288ab <[usbserial]serial_write+14b/1e0>
   d:   e8 a0 28 f0 fc            call   fcf028b2 <_EIP+0xfcf028b2> c012b150 
<schedule_task+0/70>
Code;  c32288b0 <[usbserial]serial_write+150/1e0>
  12:   53                        push   %ebx
Code;  c32288b1 <[usbserial]serial_write+151/1e0>
  13:   9d                        popf

<0>Kernel panic: Aiee, killing interrupt handler!

Next I had a look at serial_write in usbserial.c. For a better discussion I add the interesting lines to my report:

712:        job->port = port;
713:        if ((job->len = count) >= POST_BSIZE) {
714:                static int rate = 0;
715:                /*
716:                 * Data loss due to extreme circumstances.
717:                 * It's a ususal thing on serial to lose characters, isn't it?
718:                 * Neener, neener! Actually, it's probably an echo loop anyway.
719:                 * Only happens when getty starts talking to Visor.
720:                 */
721:                if (++rate % 1000 < 5)
722:                        err("too much data (%d)", count);
723:                job->len = POST_BSIZE;
724:        }
725:        memcpy(job->buff, buf, job->len);
726:
727:        spin_lock_irqsave(&post_lock, flags);
728:        list_add_tail(&job->link, &post_list);
729:        serial->ref++;          /* Protect the port->sem from kfree() */
730:        schedule_task(&post_task);
731:        spin_unlock_irqrestore(&post_lock, flags);
732:
733:        return count;
734:}

The kernel panic is caused by line 729, because serial is zero at this point
of time (as proved by the value of the ecx register). So we have a NULL pointer
dereference here. To solve this problem I patched serial_write like the
following:

--- drivers/usb/serial/usbserial.c.orig 2004-04-26 18:27:43.000000000 +0200
+++ drivers/usb/serial/usbserial.c      2004-05-10 22:02:32.000000000 +0200
@@ -705,6 +705,17 @@
                return -EINVAL;
        }

+       if (!serial) {
+               static int rate_serial = 0;
+               /*
+                 * Check if serial is NULL, because this could lead
+                 * to a kernel panic.
+                */
+               if (++rate_serial % 1000 < 5)
+                  err("usb_serial structure is NULL");
+               return -EINVAL;
+       }
+
        job = kmalloc(sizeof(struct usb_serial_post_job), GFP_ATOMIC);
        if (job == NULL)
                return -ENOMEM;

After I applied this patch the kernel panic did not appear again. But as I am
not a kernel hacker the following questions remain:

1. Is my code at the correct position.
2. Is -EINVAL the correct return code in this situation.

After I solved this problem I checked for the second problem (the hanging
pls process). I noticed that serial_write always returns count if it is
called in an interrupt and everything else goes well. From my basic
understanding I think that the return value of serial_write should be the
number of bytes really written. But in the situation of my hanging pls process
serial_write does not process count bytes, but only POST_BSIZE bytes as proved
by the kernel message:

May 10 22:37:02 euler kernel: usbserial.c: too much data (112)

So I patched serial_write in the following way:

--- drivers/usb/serial/usbserial.c.orig 2004-04-26 18:27:43.000000000 +0200
+++ drivers/usb/serial/usbserial.c      2004-05-10 22:02:32.000000000 +0200
@@ -721,6 +732,9 @@
                if (++rate % 1000 < 5)
                        err("too much data (%d)", count);
                job->len = POST_BSIZE;
+                count = job->len; /* Set count to job->len to give the correct
+                                     return value of data really written */
+
        }
        memcpy(job->buff, buf, job->len);


After I applied this patch everything works fine now (no hanging pls process, no kernel panics). I still get the following kernel message for some pls calls, but the results are correct:

May 10 22:37:02 euler kernel: usbserial.c: too much data (112)

As I do not have an in depth view in the usbserial driver the remaining
question is:

Is this correct or does this break something else?

I hope my report is helpful.

Best regards

Rüdiger Plüm

--- drivers/usb/serial/usbserial.c.orig 2004-04-26 18:27:43.000000000 +0200
+++ drivers/usb/serial/usbserial.c      2004-05-10 22:02:32.000000000 +0200
@@ -705,6 +705,17 @@
                return -EINVAL;
        }
 
+       if (!serial) {
+               static int rate_serial = 0;
+               /*
+                 * Check if serial is NULL, because this could lead
+                 * to a kernel panic.
+                */
+               if (++rate_serial % 1000 < 5)
+                  err("usb_serial structure is NULL");
+               return -EINVAL;
+       }
+
        job = kmalloc(sizeof(struct usb_serial_post_job), GFP_ATOMIC);
        if (job == NULL)
                return -ENOMEM;
@@ -721,6 +732,9 @@
                if (++rate % 1000 < 5)
                        err("too much data (%d)", count);
                job->len = POST_BSIZE;
+                count = job->len; /* Set count to job->len to give the correct
+                                     return value of data really written */
+
        }
        memcpy(job->buff, buf, job->len);
 

Reply via email to