seems my tests are pretty good in hitting some critical paths ;-)
This one happened during load/consistency test using usb-ohci on
UP testbox and 2.4.4-vanilla. (ksymoops attached)
May 6 14:13:15 srv kernel: kernel BUG at usb-ohci.h:465!
The thing was triggered from fop->read/write in the following tree:
usb_submit_urb
| sohci_submit_urb
| | spin_lock_irqsave(&usb_ed_lock,...)
| | + td_submit_urb
| | + | td_fill
| | + | | inline dma_to_td
| | + | | | inline dma_to_ed_td <---- BUG at usb-ohci.h:465
| | + | ...
| | spin_unlock_irqrestore(&usb_ed_lock,...)
...
by the offending code snippet:
/* Recover a TD/ED using its collision chain */
static inline void *
dma_to_ed_td (struct hash_list_t * entry, dma_addr_t dma)
{
struct hash_t * scan = entry->head;
while (scan && scan->dma != dma)
scan = scan->next;
if (!scan)
BUG(); <---- usb-ohci.h:465
return scan->virt;
}
Seems like the hash chains got corrupted somehow. It's not easy to hit
this, but reproducible. Setup was a loopback-like test device continously
processing up to 16 urb's submitted for in & out on 2 ep. USB loaded at
some 1.15MB/sec av. throughput (sustained). BUG() was hit several times
when sending the same data, after several 10 Minutes of continous
operation - but never at the same location (data offset).
Compiling with DEBUG didn't provide any insight so far - in fact the issue
didn't appear anymore with DEBUG enabled during more than 1 hour of
operation. Probably some serialization/synchronisation issue?
Btw, the add/remove operation on the hash collision chains appear to be
unprotected - at least on hash_*()/dma_to_*() api level. Are they
guaranteed to be always called inside some spinlock (irqsafe) - like it
was when the BUG was hitted?
Suggestions? Some additional locking around these operations to try?
Martin
ksymoops 2.3.4 on i586 2.4.4. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.4/ (default)
-m /usr/src/linux/System.map (default)
Warning: You did not tell me where to find symbol information. I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc. ksymoops -h explains the options.
Warning (compare_maps): ksyms_base symbol __VERSIONED_SYMBOL(shmem_file_setup) not
found in System.map. Ignoring ksyms_base entry
May 6 14:13:15 srv kernel: invalid operand: 0000
May 6 14:13:15 srv kernel: CPU: 0
May 6 14:13:15 srv kernel: EIP: 0010:[td_fill+145/300]
May 6 14:13:15 srv kernel: EFLAGS: 00010096
May 6 14:13:15 srv kernel: eax: 0000001e ebx: 00000000 ecx: c58893c0 edx:
00000000
May 6 14:13:15 srv kernel: esi: 0439c000 edi: cbfa9500 ebp: cbfa9510 esp:
c31a3e88
May 6 14:13:15 srv kernel: ds: 0018 es: 0018 ss: 0018
May 6 14:13:15 srv kernel: Process cmp (pid: 804, stackpage=c31a3000)
May 6 14:13:15 srv kernel: Stack: c01f84b3 c01f858f 000001d1 f0140000 00000000
c6809b60 0439c000 00000000
May 6 14:13:15 srv kernel: 00000000 c326b090 c019dd08 cbf9f000 f0140000
0439c000 00000200 c6809b60
May 6 14:13:15 srv kernel: 00000000 00000202 00000001 0326b090 c326b090
00000002 00000000 00000200
May 6 14:13:15 srv kernel: Call Trace: [<f0140000>] [td_submit_urb+372/668]
[<f0140000>] [sohci_submit_urb+1326/1344] [usb_submit_urb+30/48] [cy3671_read+289/580]
[sys_read+142/196]
May 6 14:13:15 srv kernel: Code: 0f 0b 83 c4 0c 89 f6 8b 54 24 14 8b 0b 89 0c 2a 8b
07 89 41
Using defaults from ksymoops -t elf32-i386 -a i386
Trace; f0140000 <END_OF_CODE+230bceff/????>
Code; 00000000 Before first symbol
00000000 <_EIP>:
Code; 00000000 Before first symbol
0: 0f 0b ud2a
Code; 00000002 Before first symbol
2: 83 c4 0c add $0xc,%esp
Code; 00000005 Before first symbol
5: 89 f6 mov %esi,%esi
Code; 00000007 Before first symbol
7: 8b 54 24 14 mov 0x14(%esp,1),%edx
Code; 0000000b Before first symbol
b: 8b 0b mov (%ebx),%ecx
Code; 0000000d Before first symbol
d: 89 0c 2a mov %ecx,(%edx,%ebp,1)
Code; 00000010 Before first symbol
10: 8b 07 mov (%edi),%eax
Code; 00000012 Before first symbol
12: 89 41 00 mov %eax,0x0(%ecx)
May 6 14:13:15 srv kernel: invalid operand: 0000
May 6 14:13:15 srv kernel: CPU: 0
May 6 14:13:15 srv kernel: EIP: 0010:[td_fill+145/300]
May 6 14:13:15 srv kernel: EFLAGS: 00010096
May 6 14:13:15 srv kernel: eax: 0000001e ebx: 00000000 ecx: c5889440 edx:
00000005
May 6 14:13:15 srv kernel: esi: 0324a000 edi: c4284be0 ebp: c4284bf0 esp:
c3173e88
May 6 14:13:15 srv kernel: ds: 0018 es: 0018 ss: 0018
May 6 14:13:15 srv kernel: Process dd (pid: 806, stackpage=c3173000)
May 6 14:13:15 srv kernel: Stack: c01f84b3 c01f858f 000001d1 f0080000 00000000
c46c3320 0324a000 00000000
May 6 14:13:15 srv kernel: 00000000 c326b000 c019dd08 cbf9f000 f0080000
0324a000 00000c00 c46c3320
May 6 14:13:15 srv kernel: 00000000 00000202 00000001 0326b000 c326b000
00000004 00000000 00000c00
May 6 14:13:15 srv kernel: Call Trace: [<f0080000>] [td_submit_urb+372/668]
[<f0080000>] [sohci_submit_urb+1326/1344] [usb_submit_urb+30/48]
[cy3671_write+580/656] [sys_write+142/196]
May 6 14:13:15 srv kernel: Code: 0f 0b 83 c4 0c 89 f6 8b 54 24 14 8b 0b 89 0c 2a 8b
07 89 41
Trace; f0080000 <END_OF_CODE+22ffceff/????>
Code; 00000000 Before first symbol
00000000 <_EIP>:
Code; 00000000 Before first symbol
0: 0f 0b ud2a
Code; 00000002 Before first symbol
2: 83 c4 0c add $0xc,%esp
Code; 00000005 Before first symbol
5: 89 f6 mov %esi,%esi
Code; 00000007 Before first symbol
7: 8b 54 24 14 mov 0x14(%esp,1),%edx
Code; 0000000b Before first symbol
b: 8b 0b mov (%ebx),%ecx
Code; 0000000d Before first symbol
d: 89 0c 2a mov %ecx,(%edx,%ebp,1)
Code; 00000010 Before first symbol
10: 8b 07 mov (%edi),%eax
Code; 00000012 Before first symbol
12: 89 41 00 mov %eax,0x0(%ecx)
2 warnings issued. Results may not be reliable.