James Smart wrote:
>
>
> Joe Eykholt wrote:
>> This seems pretty basic, but I'm having trouble with it.
>>
>> The current libfc code has a private structure called fc_rport_libfc_priv
>> that gets allocated along with the fc_rport structure.
>>
>> I'm working on patches that restructure this to be separately allocated,
>> since we need a the private structure to be around before we can do
>> fc_remote_port_add(). It's a long story, but my question applies to
>> any FC driver that wants to allocate its rport private data separately.
>>
> Most of the other fc drivers are in this position already, and always
> have been. They've been the reverse - they desire a position where they
> could move to rport data, but you seem to always need a context before
> you know what it is enough to talk to the transport.
>> How should rport->dd_data be set to NULL before/after calling
>> fc_rport_delete().
>>
> You never do set it to NULL. This is the role of the transport, and it
> will do so after calling the devloss_tmo_callbk(), which is the
> "end-of-life" indicator for the rport.
That sounds doable. One thing I'm wondering about is what if I
re-discover the same remote port before devloss timeout.
I'll create new context, but then when I do fc_remote_port_add()
I'll find that the fc_rport already has a dd_data. Now I have
two contexts for the same rport. I guess I could switch over to
the old one, but it's a bit awkward.
One aspect is that the newly discovered rport may have the same port_id
but different port_name and/or node_name. Since libfc shouldn't be
concerned about what the binding method is, it may or may not be the
same fc_rport as before. So fc_remote_port_add should not be called
until all three items are known, it seems to me.
It would be nice if there was a way for scsi_transport_fc to allow
creating the fc_rport with only port_id known and add functions to
change port_name and/or node_name later. However, that just passes
the problem up to the transport. It would be possible but unlikely
for a new remote port to have the same port_name as an old one, and
if that's the binding method, it's messy to handle.
BTW, why is it possible to bind by node_name? It seems useless to me.
If we made node_name just an attribute and not something that could
be used for binding, it makes the problem slightly easier. At least
we could then create fc_rports before knowing the node name.
> If you are changing it - it can cause problems as the transport still
> has the rport active, may make other calls, etc until
> devloss_tmo_callbk() would occur.
Right. That's the issue I see.
>> I used to set it to NULL before, but figure it's safer to clear it after
>> terminate_io has been called. I did a get_device(&rport->dev) first
>> to be sure the rport doesn't get freed in the meantime.
>>
> terminate_io is one of those calls the transport tries to make before
> the rport truly dies.
>
>> However, other threads may be in queuecommand already and already
>> beyond their call to fc_remote_port_chkready() and will reference
>> dd_data.
>> Maybe they just aren't allowed do that?
>>
>> Or maybe dd_data isn't allowed to go NULL until the rport times
>> out and is getting deleted?
>>
> Yes. This last point.
>
>> Maybe we need a small private rport data with just enough info for the
>> I/O
>> that's already in progress, or I can recode the I/O path to not use
>> dd_data (it's mostly stuff that can be gotten through scsi_host instead).
>>
> have it be whatever you need it to be. You can always maintain your own
> state flag in your own structure.
>
>> Or maybe we need to do more in our fc_rport_terminate_io callback
>> from fc_remote_port_delete() to be sure that all I/O really ceases.
>> We currently do an exch_mgr_reset(), but perhaps some I/O thread hasn't
>> allocated an exchange yet.
>>
> Sounds like race conditions within all the library routines. The
> transport has some simple expectations, and really doesn't care how many
> internal states the LLDD has. fc_rport_terminated_io() should terminate
> (or at least initiate termination) of anything within the driver (in a
> path post queuecommand()). Fc_remote_port_delete() is really just a
> notifier to the transport that connection to the rport has gone away and
> to start the devloss timers and block/reject new io requests - so it's
> too early in the process. I/O may or may not be killed as of
> rport_delete, as that's somewhat a LLDD policy (depends on FCP-2
> Recovery desires and how discovery engines work).
I understand.
> The devloss_tmo_callbk(), which is the "end-of-life" indicator for the
> rport, should never return from the LLDD until anything referencing the
> rport (a if you're using it's dd_data, that too), has been killed - as
> the transport may immediately "free" the rport after it's return. Do
> you have a devloss_tmo_callbk() ?
No. I can see that it would be useful.
Thanks very much for the help!
Joe
_______________________________________________
devel mailing list
[email protected]
http://www.open-fcoe.org/mailman/listinfo/devel