Martin Diehl wrote:
seems my tests are pretty good in hitting some critical paths ;-)
This one happened during load/consistency test using usb-ohci on
UP testbox and 2.4.4-vanilla. (ksymoops attached)

May  6 14:13:15 srv kernel: kernel BUG at usb-ohci.h:465!

The thing was triggered from fop->read/write in the following tree:

usb_submit_urb
| sohci_submit_urb
| | spin_lock_irqsave(&usb_ed_lock,...)
| | + td_submit_urb
| | + | td_fill
| | + | | inline dma_to_td
| | + | | | inline dma_to_ed_td     <---- BUG at usb-ohci.h:465
| | + |          ...
| | spin_unlock_irqrestore(&usb_ed_lock,...)
...

by the offending code snippet:

/* Recover a TD/ED using its collision chain */
static inline void *
dma_to_ed_td (struct hash_list_t * entry, dma_addr_t dma)
{
        struct hash_t * scan = entry->head;
        while (scan && scan->dma != dma)
                scan = scan->next;
        if (!scan)
                BUG();          <---- usb-ohci.h:465
        return scan->virt;
}

Seems like the hash chains got corrupted somehow. It's not easy to hit
this, but reproducible. Setup was a loopback-like test device continously
processing up to 16 urb's submitted for in & out on 2 ep. USB loaded at
some 1.15MB/sec av. throughput (sustained). BUG() was hit several times
when sending the same data, after several 10 Minutes of continous
operation - but never at the same location (data offset).

Compiling with DEBUG didn't provide any insight so far - in fact the issue
didn't appear anymore with DEBUG enabled during more than 1 hour of
operation. Probably some serialization/synchronisation issue?

Btw, the add/remove operation on the hash collision chains appear to be
unprotected - at least on hash_*()/dma_to_*() api level. Are they
guaranteed to be always called inside some spinlock (irqsafe) - like it
was when the BUG was hitted?

Hi Martin,

You're right. I found a  case where td_alloc() is called by sohci_submit_urb(),
which is unprotected. There may be others too. I've attached a simple patch
against 2.4.4-vanilla that adds locks around hash_add_ed_td() and
hash_free_ed_td(). I hope this fixes the chaining corruption. Test again on
your testbox and let me know.

Steve

-- 
Steve Longerbeam
MontaVista Software, Inc.
office:408-328-9008, fax:408-328-9204
http://www.mvista.com
 
--- drivers/usb/usb-ohci.h.orig Mon May  7 10:12:42 2001
+++ drivers/usb/usb-ohci.h      Mon May  7 10:23:24 2001
@@ -480,15 +480,22 @@
                                      td_dma);
 }
 
+static spinlock_t usb_hash_lock = SPIN_LOCK_UNLOCKED;
+
 /* Add a hash entry for a TD/ED; return true on success */
 static inline int
 hash_add_ed_td(struct hash_list_t * entry, void * virt, dma_addr_t dma)
 {
        struct hash_t * scan;
+        unsigned long flags;
+
+        spin_lock_irqsave (&usb_hash_lock, flags);
        
        scan = (struct hash_t *)kmalloc(sizeof(struct hash_t), ALLOC_FLAGS);
-       if (!scan)
+       if (!scan) {
+                spin_unlock_irqrestore (&usb_hash_lock, flags);
                return 0;
+       }
        
        if (!entry->tail) {
                entry->head = entry->tail = scan;
@@ -500,6 +507,7 @@
        scan->virt = virt;
        scan->dma = dma;
        scan->next = NULL;
+        spin_unlock_irqrestore (&usb_hash_lock, flags);
        return 1;
 }
 
@@ -521,7 +529,11 @@
 static inline void
 hash_free_ed_td (struct hash_list_t * entry, void * virt)
 {
+        unsigned long flags;
        struct hash_t *scan, *prev;
+
+        spin_lock_irqsave (&usb_hash_lock, flags);
+
        scan = prev = entry->head;
 
        // Find and unlink hash entry
@@ -542,6 +554,7 @@
                        prev->next = scan->next;
                kfree(scan);
        }
+        spin_unlock_irqrestore (&usb_hash_lock, flags);
 }
 
 static inline void


Reply via email to