seems my tests are pretty good in hitting some critical paths ;-)Hi Martin,
This one happened during load/consistency test using usb-ohci on
UP testbox and 2.4.4-vanilla. (ksymoops attached)May 6 14:13:15 srv kernel: kernel BUG at usb-ohci.h:465!
The thing was triggered from fop->read/write in the following tree:
usb_submit_urb
| sohci_submit_urb
| | spin_lock_irqsave(&usb_ed_lock,...)
| | + td_submit_urb
| | + | td_fill
| | + | | inline dma_to_td
| | + | | | inline dma_to_ed_td <---- BUG at usb-ohci.h:465
| | + | ...
| | spin_unlock_irqrestore(&usb_ed_lock,...)
...by the offending code snippet:
/* Recover a TD/ED using its collision chain */
static inline void *
dma_to_ed_td (struct hash_list_t * entry, dma_addr_t dma)
{
struct hash_t * scan = entry->head;
while (scan && scan->dma != dma)
scan = scan->next;
if (!scan)
BUG(); <---- usb-ohci.h:465
return scan->virt;
}Seems like the hash chains got corrupted somehow. It's not easy to hit
this, but reproducible. Setup was a loopback-like test device continously
processing up to 16 urb's submitted for in & out on 2 ep. USB loaded at
some 1.15MB/sec av. throughput (sustained). BUG() was hit several times
when sending the same data, after several 10 Minutes of continous
operation - but never at the same location (data offset).Compiling with DEBUG didn't provide any insight so far - in fact the issue
didn't appear anymore with DEBUG enabled during more than 1 hour of
operation. Probably some serialization/synchronisation issue?Btw, the add/remove operation on the hash collision chains appear to be
unprotected - at least on hash_*()/dma_to_*() api level. Are they
guaranteed to be always called inside some spinlock (irqsafe) - like it
was when the BUG was hitted?
You're right. I found a case where td_alloc() is called
by sohci_submit_urb(),
which is unprotected. There may be others too. I've attached a simple
patch
against 2.4.4-vanilla that adds locks around hash_add_ed_td() and
hash_free_ed_td(). I hope this fixes the chaining corruption.
Test again on
your testbox and let me know.
Steve
-- Steve Longerbeam MontaVista Software, Inc. office:408-328-9008, fax:408-328-9204 http://www.mvista.com
--- drivers/usb/usb-ohci.h.orig Mon May 7 10:12:42 2001
+++ drivers/usb/usb-ohci.h Mon May 7 10:23:24 2001
@@ -480,15 +480,22 @@
td_dma);
}
+static spinlock_t usb_hash_lock = SPIN_LOCK_UNLOCKED;
+
/* Add a hash entry for a TD/ED; return true on success */
static inline int
hash_add_ed_td(struct hash_list_t * entry, void * virt, dma_addr_t dma)
{
struct hash_t * scan;
+ unsigned long flags;
+
+ spin_lock_irqsave (&usb_hash_lock, flags);
scan = (struct hash_t *)kmalloc(sizeof(struct hash_t), ALLOC_FLAGS);
- if (!scan)
+ if (!scan) {
+ spin_unlock_irqrestore (&usb_hash_lock, flags);
return 0;
+ }
if (!entry->tail) {
entry->head = entry->tail = scan;
@@ -500,6 +507,7 @@
scan->virt = virt;
scan->dma = dma;
scan->next = NULL;
+ spin_unlock_irqrestore (&usb_hash_lock, flags);
return 1;
}
@@ -521,7 +529,11 @@
static inline void
hash_free_ed_td (struct hash_list_t * entry, void * virt)
{
+ unsigned long flags;
struct hash_t *scan, *prev;
+
+ spin_lock_irqsave (&usb_hash_lock, flags);
+
scan = prev = entry->head;
// Find and unlink hash entry
@@ -542,6 +554,7 @@
prev->next = scan->next;
kfree(scan);
}
+ spin_unlock_irqrestore (&usb_hash_lock, flags);
}
static inline void
