On Thu, Jul 09, 2009 at 03:47:57PM -0700, Joe Eykholt wrote: > Chris Leech wrote: > > I ran a parallel create/destroy/remove test overnight and something > > deadlocked. Running with lockdep enabled gives a reproducible warning, > > but I'm having trouble making sense of it. I'm not sure I understand > > what the "events" lock is here. > > I'm not sure why it says "events" either. I think it has something > to do with flush_work() calling lock_map_acquire/release to indicate > that the work items it will wait on may need locks. > > I see one problem, though. fcoe_ctlr_destroy() is doing a > flush_work and it uses the general work thread. So does > linkwatch_event(), which needs rtnl_lock(). So the flush_work() > may hang forever if there's a linkwatch_event queued. Shoot.
Thanks, that must be it. I knew it had something to do with a destroy and a linkwatch event firing at the same time, I just couldn't put together the deadlock scenario. > Ways to fix it: > 1) have FIP use its own work queue. > 2) separately flush the FIP work queue while not holding rtnl_lock. > 3) go back to using a separate mutex for fcoe create/delete, but > use rtnl_lock for the hostlist to protect the notification. > 4) something better? _______________________________________________ devel mailing list [email protected] http://www.open-fcoe.org/mailman/listinfo/devel
