On Fri, 6 Feb 2004, Stephen Hemminger wrote: > Alan Stern wrote: > > >I should have been more explicit. I meant: Can you provide an explanation > >of what caused your deadlock, more detailed than just "the list management > >was crap"? Can you point to exactly what in the code went wrong, and can > >you provide a minimal patch to fix it, remaining in the context of the > >current crappy lists and locks? > > > > > Sorry, I wasn't really expecting this version to get accepted as is. > Just start some exchange over > what needs to be fixed. I never got a "smoking gun" crash, that pointed > to exactly a race between > two exact lines, but what I saw was urb private structures that were in > the remove list but had > already been freed. The problem with juggling three lists (urb's, > remove, and complete) with > three seperate locks is that an dequeue operation can happen during the > middle of one of > the steps of the uhci_irq processing.
Yeah, the current locking scheme is a mess. But it's supposed to be correctly-functioning mess, and I'm really curious to know where the problem lies. Without that, I would never be quite certain that changing the locking had solved the underlying error. Did you mention that the oops could easily be recreated? And did you ever try out my suggestion of adding printk's to trace all the urbp transitions among the various lists? > How about something like: > - #ifdef out the /proc debug stuff, so other changes can break it > causing grief. I don't have your patch with me here so I can't check the details. Is it possible to remove the part of the /proc stuff that prints the complete_list while leaving the rest as is? Would that be compatible with your changes? In other words, can you make a minimal change to uhci-debug.c so that most of it will continue to work while you make the other changes? > - put big lock in uhci and leave other spurflous locks in. Okay. > - move complete_list out of uhci Again I need to check the details, but that doesn't seem to be necessary. It's (almost) true that the complete_list is only used within the dynamic scope of uhci_irq(), so it doesn't need to be locked. Since it's used in a few subroutines, though, you would have to pass the list header as an explicit argument if it's not in uhci, right? You might as well just leave the list header where it is; it doesn't take up much space. I said "(almost)" above because complete_list is also used within the scope of uhci_stop(). To be totally correct you would have to create a list header in that routine as well. However one of the changes I want to make is to move the calls to uhci_free_pending_qhs() etc. out of uhci_irq() and uhci_stop() into a common list-handling subroutine. Another change I'd like to make is to move most of the interrupt processing into some sort of bottom half. When that's done the complete_list may no longer be needed; URBs could be given back as soon as the driver knew they are ready. > - remove spurflous locks one at a time. That will work. Take care to preserve the correct semantics. Alan Stern ------------------------------------------------------- The SF.Net email is sponsored by EclipseCon 2004 Premiere Conference on Open Tools Development and Integration See the breadth of Eclipse activity. February 3-5 in Anaheim, CA. http://www.eclipsecon.org/osdn _______________________________________________ [EMAIL PROTECTED] To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-devel