seems my tests are pretty good in hitting some critical paths ;-)Hi Martin,
This one happened during load/consistency test using usb-ohci on
UP testbox and 2.4.4-vanilla. (ksymoops attached)May 6 14:13:15 srv kernel: kernel BUG at usb-ohci.h:465!
The thing was triggered from fop->read/write in the following tree:
usb_submit_urb
| sohci_submit_urb
| | spin_lock_irqsave(&usb_ed_lock,...)
| | + td_submit_urb
| | + | td_fill
| | + | | inline dma_to_td
| | + | | | inline dma_to_ed_td <---- BUG at usb-ohci.h:465
| | + | ...
| | spin_unlock_irqrestore(&usb_ed_lock,...)
...by the offending code snippet:
/* Recover a TD/ED using its collision chain */
static inline void *
dma_to_ed_td (struct hash_list_t * entry, dma_addr_t dma)
{
struct hash_t * scan = entry->head;
while (scan && scan->dma != dma)
scan = scan->next;
if (!scan)
BUG(); <---- usb-ohci.h:465
return scan->virt;
}Seems like the hash chains got corrupted somehow. It's not easy to hit
this, but reproducible. Setup was a loopback-like test device continously
processing up to 16 urb's submitted for in & out on 2 ep. USB loaded at
some 1.15MB/sec av. throughput (sustained). BUG() was hit several times
when sending the same data, after several 10 Minutes of continous
operation - but never at the same location (data offset).Compiling with DEBUG didn't provide any insight so far - in fact the issue
didn't appear anymore with DEBUG enabled during more than 1 hour of
operation. Probably some serialization/synchronisation issue?Btw, the add/remove operation on the hash collision chains appear to be
unprotected - at least on hash_*()/dma_to_*() api level. Are they
guaranteed to be always called inside some spinlock (irqsafe) - like it
was when the BUG was hitted?
You're right. I found a case where td_alloc() is called
by sohci_submit_urb(),
which is unprotected. There may be others too. I've attached a simple
patch
against 2.4.4-vanilla that adds locks around hash_add_ed_td() and
hash_free_ed_td(). I hope this fixes the chaining corruption.
Test again on
your testbox and let me know.
Steve
-- Steve Longerbeam MontaVista Software, Inc. office:408-328-9008, fax:408-328-9204 http://www.mvista.com
--- drivers/usb/usb-ohci.h.orig Mon May 7 10:12:42 2001 +++ drivers/usb/usb-ohci.h Mon May 7 10:23:24 2001 @@ -480,15 +480,22 @@ td_dma); } +static spinlock_t usb_hash_lock = SPIN_LOCK_UNLOCKED; + /* Add a hash entry for a TD/ED; return true on success */ static inline int hash_add_ed_td(struct hash_list_t * entry, void * virt, dma_addr_t dma) { struct hash_t * scan; + unsigned long flags; + + spin_lock_irqsave (&usb_hash_lock, flags); scan = (struct hash_t *)kmalloc(sizeof(struct hash_t), ALLOC_FLAGS); - if (!scan) + if (!scan) { + spin_unlock_irqrestore (&usb_hash_lock, flags); return 0; + } if (!entry->tail) { entry->head = entry->tail = scan; @@ -500,6 +507,7 @@ scan->virt = virt; scan->dma = dma; scan->next = NULL; + spin_unlock_irqrestore (&usb_hash_lock, flags); return 1; } @@ -521,7 +529,11 @@ static inline void hash_free_ed_td (struct hash_list_t * entry, void * virt) { + unsigned long flags; struct hash_t *scan, *prev; + + spin_lock_irqsave (&usb_hash_lock, flags); + scan = prev = entry->head; // Find and unlink hash entry @@ -542,6 +554,7 @@ prev->next = scan->next; kfree(scan); } + spin_unlock_irqrestore (&usb_hash_lock, flags); } static inline void