Robert Love wrote: > On Wed, 2008-09-10 at 10:03 -0500, Mike Christie wrote: >> Robert Love wrote: >>> On Tue, 2008-09-09 at 10:22 -0700, Robert Love wrote: >>>> The following series implements a change to the rport state machine and >>>> applies a locking strategy to that state machine. >>>> >>>> 1) A callback function is added to the LP for RP events (READY, FAILED, >>>> LOGO) >>>> 2) GNN_ID and GPN_ID are removed, rport state machine starts at PLOGI >>>> - uses a dummy rport until PLOGI response >>>> 3) A READY state is added to the RP state machine >>>> 4) The RP state machine locking has been given a locking strategy >>>> 5) RP only has one error/retry handling routine >>>> 6) Naming conventions are changed >>>> rport (remote port) = fc_rport pointer >>>> rdata (remote port private) = fc_rport_libfc_priv pointer >>>> >>>> I think these patches still need a little cleaning up, but they're mostly >>>> good. >>>> I'm curious what everyone thinks about the folowing changes to: >>>> >>>> a) the locking policy >>>> b) the event callback mechanism >>>> c) the consolidation of fc_rport_error and fc_rport_retry. >>>> >>>> All comments are welcome. If there are no major issues I'll do a little >>>> cleanup, >>>> drop the RFC, resubmit and then commit. >>> I have three problems with this patch-set right now. >>> >>> 1. Need to delete dummy rports when there is a timeout scenario since >>> they're not tracked by the FC transport class. >>> >>> 2. Should delay the retry a bit if there is a resource allocation error. >>> >>> 3. Can't hold locks while calling some functions. >>> >>> It's this last one that's going to be difficult. My problem is that I >>> want to call fc_remote_port_add/delete while I'm holding the rport lock. >>> Those functions expect to be called without a lock held. Also, I haven't >>> finished the lport locking, but I expect that the event_callback >>> function will want to lock the lport so that it can change the lport's >>> state. >>> >>> In all of these cases I want to call a function while I'm holding the >>> rport lock and those functions expect to be called without the lock >>> held. Right now my patches drop the lock momentarily before calling one >>> of these functions and then re-grab the lock when the function returns. >>> This is not ideal. If I keep it this way I'd need to re-check the state >>> once I re-grab the lock and it'll get ugly. >>> >> For the fc class rport issues the problem is that those functions can >> use GFP_KERNEL or flush a work queue? Do we have to use spin locks for > > I believe so. The fc_remote_port_add() function uses GFP_KERNEL. > >> the rport and lrport locking? Why not use a mutex or semaphore, since we >> have process context and it is not performance critical? > > That's a good idea. I changed my patches to use a semaphore and it's > working just fine. However, I don't understand the contexts that we > might be called in from SCSI. Does SCSI always call us from process > context? For recv and sysfs we should always be in process context, but > SCSI I'm not sure... >
Did you change the rport and lport or just the rport locking? For normal IO paths we can be called from a softirq or thread. This path never takes the rport or lport lock though. The scsi eh paths (abort, lun reset, host reset, etc) will always be called from a thread. host reset will call lport_reset. I think in the code with your patches that code might be missing a lock. the tt->lport_reset is set to fc_lport_enter_reset and I thought you want to hold the lport lock. If you are changing the rport and lrport locking then don't forget about checking out the lrport code. I think you want to change that to use a workqueue like how the rport code is, because fc_rport_timeout is run from a work queue can be use a semaphore, but fc_lport_timeout is run from a timer (softirq/bottom-half) context so that cannot sleep. I think actually my comment about having process context is wrong for the fc_exch.c code paths too. fc_exch_timeout would need to be converted to use the workqueue like the rport code too. fc_exch_timeout is run from a timer and can call into lport/rport/ns code that takes a lock. _______________________________________________ devel mailing list [email protected] http://www.open-fcoe.org/mailman/listinfo/devel
