On Wed, Feb 08, 2012 at 05:42:12PM -0800, Vasu Dev wrote: > On Wed, 2012-02-08 at 19:56 -0500, Neil Horman wrote: > > On Wed, Feb 08, 2012 at 12:35:07PM -0800, Vasu Dev wrote: >><snip>
> > 3) We can make the rx queue flush work properly by guaranteeing that > > fcoe_recv_frame will always be called in the context of a recevier thread. > > Thats a workable alternative, but if, as you've said, you don't want to > > introduce additional locking, its a lousy idea, having to call > > spin_lock_bh/spin_unlock_bh on each received frame is an enormous overhead > > above > > and beyond taking an rcu_read_lock on a list traversal. > > I agree with all. > I concur. I hate having to constantly lock in the fast path, but we can look at ways to remove the rx list lock (or mitigate it) another day. Given that, the more I look at at fcoe_recv_frame, the more problems I find with it getting called in softirq context, I think this is the best way to cover all the problems. I'll have a patch for you later today. > > > > > > I should also point out there is precident for what I'm doing here, the > > lport->vports list is accessed both the add/delete execution paths, and in > > the > > softirq path, and is protected by a mutex (which is a bug in and of iteself, > > given that calling fc_vport_id_lookup in softirq context will give you a big > > might_sleep warning, or a sleeping in interrupt panic if the mutex is > > already > > held. The point here being however, you have other lists that are already > > protected for what appear to be the exact same reasons. Although this does > > speak to the fact that, regardless of this conversation, we should not > > currently > > ever try to call fcoe_recv_frame in softirq context. I'll take care of that > > tomorrow. That will have the pleasant side effect of taking care of this > > bug as > > well, since it will make the rx thread purge work properly in > > fcoe_if_destroy. > > Looks okay to me and the flows I was calling out in 2) should get > covered with that and that too w/o having lock here, it will always take > hit of context switch but I guess much better than having lock. > > > > > > > I had a bunch more comments on the rest of your thread, but seeing that the > > fcoe_recv_frame can cause lots more problems, I think this is all moot. :) > > > > We might need the patch I sent out to not free master lport until done > with fcoe interface cleanup. > > > Anywho, I hope the above clarifies what the problem here is. Basically, > > what > > you think is preventing mutual access of the ema_list, isn't. But the fact > > that > > fcoe_recv_frame isn't a path that can handle softirq context is the bigger > > problem here, and fixing that will fix the ema_list at the same time. I'll > > have > > a new patch in the morning. > > Great, thanks. I appreciate your detailed mail and I guess we converged > on issues requiring fix before ema_list. > Thank you for putting up with me on this issue. I'm glad we could find a way to fix this (and several other potential problems at the same time :)). I'll have a fix posted later today. Regards Neil _______________________________________________ devel mailing list devel@open-fcoe.org https://lists.open-fcoe.org/mailman/listinfo/devel