Hi all,
In last week's discussion I said I was working on using the lport mutex
instead of the disc mutex to serialize discovery, and having the rport
list be under the lport mutex, and maintained by the rport module.
I'm backing away from those aspects of my changes. Partly because of
objections, and partly because I had trouble making it work.
It turns out discovery needs its own locks in order to cancel its work
queue items while the lport lock is held, e.g., from lport_enter_reset().
Having the rports list under the discovery lock is convenient as well.
Now for a new snag. Rogue rports. These raise a problem when
we're converting from a rogue to a real rport and an ELS such as LOGO
arrives. Since we need to drop the rport lock while doing fc_remote_port_add(),
the LOGO finds the rogue rport, and re-queues work to delete it. We end
up deleting it twice, once in the CREATE event, and once in the LOGO event.
This usually didn't happen before because logout would flush any outstanding
work, but that required logo to drop the lock, during which the rport state
could change, which seems totally unacceptable to me. We shouldn't drop
the lock until we're done with the rport in any of the ELS handlers.
Back to the work handler problem: I suppose this could be fixed up
after doing the fc_remote_port_add() by re-evaluating the rogue's state,
but I'm not sure that would completely work. It might also be more
easily fixed if we could do the fc_remote_port_add() while holding the
rport mutex, but that's not currently allowed. It might work anyway
if the add were done with roles set to UNKNOWN and then changed later
after dropping the mutex.
Things would be a lot simpler if we went back to having the
rdata structure separately allocated from the fc_rport, and with a life of
its own, and perhaps its own kref to keep it around while in use.
That would get rid of the need for rogues, and we would just have a
pointer to the transport fc_rport, and it would have a pointer back to our
rdata.
This also solves another problem. In rport_lookup(), we hold the disc_mutex
to protect the list, and do a get_device() on the rport to keep it from going
away, then drop the disc_mutex. However, the rport lock isn't held, so the
rport could be about to be deleted. I have rport_lookup() checking the state,
but without grabbing the lock. The caller of lookup (e.g., ELS receive
handling)
must grab the rport mutex, check the state, and if it's NONE (I'm renaming that
DELETED), then drop the mutex and repeat the lookup. That will be
unnecessary if we have our own rdata structure separately allocated.
It would change many of our interfaces to use our struct instead of fc_rport.
Does this sound OK?
One suggestion has been to make scsi_transport_fc's rport allow these other
states as well, allowing them to be created earlier and change identity
(WWPN, WWNN) after being created. Any changes along these lines would seem
to involve the other drivers that use them, however, and I'm not prepared to
test those. It seems like way too much of a change. Maybe that's a longer
term project someone else would be willing to do.
Along with this, I'd like to rename struct fc_rport_libfc_priv to something
else.
Any suggestions? How about fc_peer? or fc_pport? I want something short like
that,
not specifically a target or initiator, not necessarily an N port or
a discovered port, maybe just a port?
I welcome your questions and suggestions.
Thanks,
Joe
_______________________________________________
devel mailing list
[email protected]
http://www.open-fcoe.org/mailman/listinfo/devel