seems my tests are pretty good in hitting some critical paths ;-)
This one happened during load/consistency test using usb-ohci on
UP testbox and 2.4.4-vanilla. (ksymoops attached)

May  6 14:13:15 srv kernel: kernel BUG at usb-ohci.h:465!

The thing was triggered from fop->read/write in the following tree:

usb_submit_urb
| sohci_submit_urb
| | spin_lock_irqsave(&usb_ed_lock,...)
| | + td_submit_urb
| | + | td_fill
| | + | | inline dma_to_td
| | + | | | inline dma_to_ed_td     <---- BUG at usb-ohci.h:465
| | + |          ...
| | spin_unlock_irqrestore(&usb_ed_lock,...)
...

by the offending code snippet:

/* Recover a TD/ED using its collision chain */
static inline void *
dma_to_ed_td (struct hash_list_t * entry, dma_addr_t dma)
{
        struct hash_t * scan = entry->head;
        while (scan && scan->dma != dma)
                scan = scan->next;
        if (!scan)
                BUG();          <---- usb-ohci.h:465
        return scan->virt;
}

Seems like the hash chains got corrupted somehow. It's not easy to hit
this, but reproducible. Setup was a loopback-like test device continously
processing up to 16 urb's submitted for in & out on 2 ep. USB loaded at
some 1.15MB/sec av. throughput (sustained). BUG() was hit several times
when sending the same data, after several 10 Minutes of continous
operation - but never at the same location (data offset).

Compiling with DEBUG didn't provide any insight so far - in fact the issue
didn't appear anymore with DEBUG enabled during more than 1 hour of
operation. Probably some serialization/synchronisation issue?

Btw, the add/remove operation on the hash collision chains appear to be
unprotected - at least on hash_*()/dma_to_*() api level. Are they
guaranteed to be always called inside some spinlock (irqsafe) - like it
was when the BUG was hitted?

Suggestions? Some additional locking around these operations to try?

Martin

ksymoops 2.3.4 on i586 2.4.4.  Options used
     -V (default)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.4.4/ (default)
     -m /usr/src/linux/System.map (default)

Warning: You did not tell me where to find symbol information.  I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc.  ksymoops -h explains the options.

Warning (compare_maps): ksyms_base symbol __VERSIONED_SYMBOL(shmem_file_setup) not 
found in System.map.  Ignoring ksyms_base entry
May  6 14:13:15 srv kernel: invalid operand: 0000
May  6 14:13:15 srv kernel: CPU:    0
May  6 14:13:15 srv kernel: EIP:    0010:[td_fill+145/300]
May  6 14:13:15 srv kernel: EFLAGS: 00010096
May  6 14:13:15 srv kernel: eax: 0000001e   ebx: 00000000   ecx: c58893c0   edx: 
00000000
May  6 14:13:15 srv kernel: esi: 0439c000   edi: cbfa9500   ebp: cbfa9510   esp: 
c31a3e88
May  6 14:13:15 srv kernel: ds: 0018   es: 0018   ss: 0018
May  6 14:13:15 srv kernel: Process cmp (pid: 804, stackpage=c31a3000)
May  6 14:13:15 srv kernel: Stack: c01f84b3 c01f858f 000001d1 f0140000 00000000 
c6809b60 0439c000 00000000 
May  6 14:13:15 srv kernel:        00000000 c326b090 c019dd08 cbf9f000 f0140000 
0439c000 00000200 c6809b60 
May  6 14:13:15 srv kernel:        00000000 00000202 00000001 0326b090 c326b090 
00000002 00000000 00000200 
May  6 14:13:15 srv kernel: Call Trace: [<f0140000>] [td_submit_urb+372/668] 
[<f0140000>] [sohci_submit_urb+1326/1344] [usb_submit_urb+30/48] [cy3671_read+289/580] 
[sys_read+142/196] 
May  6 14:13:15 srv kernel: Code: 0f 0b 83 c4 0c 89 f6 8b 54 24 14 8b 0b 89 0c 2a 8b 
07 89 41 
Using defaults from ksymoops -t elf32-i386 -a i386

Trace; f0140000 <END_OF_CODE+230bceff/????>
Code;  00000000 Before first symbol
00000000 <_EIP>:
Code;  00000000 Before first symbol
   0:   0f 0b                     ud2a   
Code;  00000002 Before first symbol
   2:   83 c4 0c                  add    $0xc,%esp
Code;  00000005 Before first symbol
   5:   89 f6                     mov    %esi,%esi
Code;  00000007 Before first symbol
   7:   8b 54 24 14               mov    0x14(%esp,1),%edx
Code;  0000000b Before first symbol
   b:   8b 0b                     mov    (%ebx),%ecx
Code;  0000000d Before first symbol
   d:   89 0c 2a                  mov    %ecx,(%edx,%ebp,1)
Code;  00000010 Before first symbol
  10:   8b 07                     mov    (%edi),%eax
Code;  00000012 Before first symbol
  12:   89 41 00                  mov    %eax,0x0(%ecx)

May  6 14:13:15 srv kernel: invalid operand: 0000
May  6 14:13:15 srv kernel: CPU:    0
May  6 14:13:15 srv kernel: EIP:    0010:[td_fill+145/300]
May  6 14:13:15 srv kernel: EFLAGS: 00010096
May  6 14:13:15 srv kernel: eax: 0000001e   ebx: 00000000   ecx: c5889440   edx: 
00000005
May  6 14:13:15 srv kernel: esi: 0324a000   edi: c4284be0   ebp: c4284bf0   esp: 
c3173e88
May  6 14:13:15 srv kernel: ds: 0018   es: 0018   ss: 0018
May  6 14:13:15 srv kernel: Process dd (pid: 806, stackpage=c3173000)
May  6 14:13:15 srv kernel: Stack: c01f84b3 c01f858f 000001d1 f0080000 00000000 
c46c3320 0324a000 00000000 
May  6 14:13:15 srv kernel:        00000000 c326b000 c019dd08 cbf9f000 f0080000 
0324a000 00000c00 c46c3320 
May  6 14:13:15 srv kernel:        00000000 00000202 00000001 0326b000 c326b000 
00000004 00000000 00000c00 
May  6 14:13:15 srv kernel: Call Trace: [<f0080000>] [td_submit_urb+372/668] 
[<f0080000>] [sohci_submit_urb+1326/1344] [usb_submit_urb+30/48] 
[cy3671_write+580/656] [sys_write+142/196] 
May  6 14:13:15 srv kernel: Code: 0f 0b 83 c4 0c 89 f6 8b 54 24 14 8b 0b 89 0c 2a 8b 
07 89 41 

Trace; f0080000 <END_OF_CODE+22ffceff/????>
Code;  00000000 Before first symbol
00000000 <_EIP>:
Code;  00000000 Before first symbol
   0:   0f 0b                     ud2a   
Code;  00000002 Before first symbol
   2:   83 c4 0c                  add    $0xc,%esp
Code;  00000005 Before first symbol
   5:   89 f6                     mov    %esi,%esi
Code;  00000007 Before first symbol
   7:   8b 54 24 14               mov    0x14(%esp,1),%edx
Code;  0000000b Before first symbol
   b:   8b 0b                     mov    (%ebx),%ecx
Code;  0000000d Before first symbol
   d:   89 0c 2a                  mov    %ecx,(%edx,%ebp,1)
Code;  00000010 Before first symbol
  10:   8b 07                     mov    (%edi),%eax
Code;  00000012 Before first symbol
  12:   89 41 00                  mov    %eax,0x0(%ecx)


2 warnings issued.  Results may not be reliable.

Reply via email to