seems my tests are pretty good in hitting some critical paths ;-) This one happened during load/consistency test using usb-ohci on UP testbox and 2.4.4-vanilla. (ksymoops attached) May 6 14:13:15 srv kernel: kernel BUG at usb-ohci.h:465! The thing was triggered from fop->read/write in the following tree: usb_submit_urb | sohci_submit_urb | | spin_lock_irqsave(&usb_ed_lock,...) | | + td_submit_urb | | + | td_fill | | + | | inline dma_to_td | | + | | | inline dma_to_ed_td <---- BUG at usb-ohci.h:465 | | + | ... | | spin_unlock_irqrestore(&usb_ed_lock,...) ... by the offending code snippet: /* Recover a TD/ED using its collision chain */ static inline void * dma_to_ed_td (struct hash_list_t * entry, dma_addr_t dma) { struct hash_t * scan = entry->head; while (scan && scan->dma != dma) scan = scan->next; if (!scan) BUG(); <---- usb-ohci.h:465 return scan->virt; } Seems like the hash chains got corrupted somehow. It's not easy to hit this, but reproducible. Setup was a loopback-like test device continously processing up to 16 urb's submitted for in & out on 2 ep. USB loaded at some 1.15MB/sec av. throughput (sustained). BUG() was hit several times when sending the same data, after several 10 Minutes of continous operation - but never at the same location (data offset). Compiling with DEBUG didn't provide any insight so far - in fact the issue didn't appear anymore with DEBUG enabled during more than 1 hour of operation. Probably some serialization/synchronisation issue? Btw, the add/remove operation on the hash collision chains appear to be unprotected - at least on hash_*()/dma_to_*() api level. Are they guaranteed to be always called inside some spinlock (irqsafe) - like it was when the BUG was hitted? Suggestions? Some additional locking around these operations to try? Martin
ksymoops 2.3.4 on i586 2.4.4. Options used -V (default) -k /proc/ksyms (default) -l /proc/modules (default) -o /lib/modules/2.4.4/ (default) -m /usr/src/linux/System.map (default) Warning: You did not tell me where to find symbol information. I will assume that the log matches the kernel and modules that are running right now and I'll use the default options above for symbol resolution. If the current kernel and/or modules do not match the log, you can get more accurate output by telling me the kernel version and where to find map, modules, ksyms etc. ksymoops -h explains the options. Warning (compare_maps): ksyms_base symbol __VERSIONED_SYMBOL(shmem_file_setup) not found in System.map. Ignoring ksyms_base entry May 6 14:13:15 srv kernel: invalid operand: 0000 May 6 14:13:15 srv kernel: CPU: 0 May 6 14:13:15 srv kernel: EIP: 0010:[td_fill+145/300] May 6 14:13:15 srv kernel: EFLAGS: 00010096 May 6 14:13:15 srv kernel: eax: 0000001e ebx: 00000000 ecx: c58893c0 edx: 00000000 May 6 14:13:15 srv kernel: esi: 0439c000 edi: cbfa9500 ebp: cbfa9510 esp: c31a3e88 May 6 14:13:15 srv kernel: ds: 0018 es: 0018 ss: 0018 May 6 14:13:15 srv kernel: Process cmp (pid: 804, stackpage=c31a3000) May 6 14:13:15 srv kernel: Stack: c01f84b3 c01f858f 000001d1 f0140000 00000000 c6809b60 0439c000 00000000 May 6 14:13:15 srv kernel: 00000000 c326b090 c019dd08 cbf9f000 f0140000 0439c000 00000200 c6809b60 May 6 14:13:15 srv kernel: 00000000 00000202 00000001 0326b090 c326b090 00000002 00000000 00000200 May 6 14:13:15 srv kernel: Call Trace: [<f0140000>] [td_submit_urb+372/668] [<f0140000>] [sohci_submit_urb+1326/1344] [usb_submit_urb+30/48] [cy3671_read+289/580] [sys_read+142/196] May 6 14:13:15 srv kernel: Code: 0f 0b 83 c4 0c 89 f6 8b 54 24 14 8b 0b 89 0c 2a 8b 07 89 41 Using defaults from ksymoops -t elf32-i386 -a i386 Trace; f0140000 <END_OF_CODE+230bceff/????> Code; 00000000 Before first symbol 00000000 <_EIP>: Code; 00000000 Before first symbol 0: 0f 0b ud2a Code; 00000002 Before first symbol 2: 83 c4 0c add $0xc,%esp Code; 00000005 Before first symbol 5: 89 f6 mov %esi,%esi Code; 00000007 Before first symbol 7: 8b 54 24 14 mov 0x14(%esp,1),%edx Code; 0000000b Before first symbol b: 8b 0b mov (%ebx),%ecx Code; 0000000d Before first symbol d: 89 0c 2a mov %ecx,(%edx,%ebp,1) Code; 00000010 Before first symbol 10: 8b 07 mov (%edi),%eax Code; 00000012 Before first symbol 12: 89 41 00 mov %eax,0x0(%ecx) May 6 14:13:15 srv kernel: invalid operand: 0000 May 6 14:13:15 srv kernel: CPU: 0 May 6 14:13:15 srv kernel: EIP: 0010:[td_fill+145/300] May 6 14:13:15 srv kernel: EFLAGS: 00010096 May 6 14:13:15 srv kernel: eax: 0000001e ebx: 00000000 ecx: c5889440 edx: 00000005 May 6 14:13:15 srv kernel: esi: 0324a000 edi: c4284be0 ebp: c4284bf0 esp: c3173e88 May 6 14:13:15 srv kernel: ds: 0018 es: 0018 ss: 0018 May 6 14:13:15 srv kernel: Process dd (pid: 806, stackpage=c3173000) May 6 14:13:15 srv kernel: Stack: c01f84b3 c01f858f 000001d1 f0080000 00000000 c46c3320 0324a000 00000000 May 6 14:13:15 srv kernel: 00000000 c326b000 c019dd08 cbf9f000 f0080000 0324a000 00000c00 c46c3320 May 6 14:13:15 srv kernel: 00000000 00000202 00000001 0326b000 c326b000 00000004 00000000 00000c00 May 6 14:13:15 srv kernel: Call Trace: [<f0080000>] [td_submit_urb+372/668] [<f0080000>] [sohci_submit_urb+1326/1344] [usb_submit_urb+30/48] [cy3671_write+580/656] [sys_write+142/196] May 6 14:13:15 srv kernel: Code: 0f 0b 83 c4 0c 89 f6 8b 54 24 14 8b 0b 89 0c 2a 8b 07 89 41 Trace; f0080000 <END_OF_CODE+22ffceff/????> Code; 00000000 Before first symbol 00000000 <_EIP>: Code; 00000000 Before first symbol 0: 0f 0b ud2a Code; 00000002 Before first symbol 2: 83 c4 0c add $0xc,%esp Code; 00000005 Before first symbol 5: 89 f6 mov %esi,%esi Code; 00000007 Before first symbol 7: 8b 54 24 14 mov 0x14(%esp,1),%edx Code; 0000000b Before first symbol b: 8b 0b mov (%ebx),%ecx Code; 0000000d Before first symbol d: 89 0c 2a mov %ecx,(%edx,%ebp,1) Code; 00000010 Before first symbol 10: 8b 07 mov (%edi),%eax Code; 00000012 Before first symbol 12: 89 41 00 mov %eax,0x0(%ecx) 2 warnings issued. Results may not be reliable.