I was starting to have this discussion with Mike, but he suggested
moving it to the mailing list, so here it is. I'm going to mail
my patch set following this mail too. I'm keeping the last few
patches that do the lport locking to myself because they're not
working right now. The rest of the patches may not be complete,
but they're functional. Let me know if you have any opinions. All
of the following patches are just an RFC, they will not be applied
yet.

...
Before getting to the locking I think that there are two issues
that need to be resolved.

1) The lport state machine needs to be isolated to fc_lport.c
2) Where should name server registration go, fc_lport.c or fc_ns.c?

1.
Currently there are two problems in this area. I don't like that
fc_ns.c and fc_rport.c both change the state of the lport. fc_ns.c
moves the lport through ST_REG_FT, ST_REG_SCR and ST_READY as well
as ST_DNS_STOP. When an rport becomes READY. fc_rport.c always checks
to see if that port is the lp->dns_rp and upon that determination
it will take some action, possibly calling back into the lport.

It bothers me that both the NS and RP code are making decisions for
the LP. Locally I have two changes to solve these problems.

a) I created an event_callback() function in the tt
(libfc_transport_template) that the rport code calls upon READY or
error for an rport going through its state machine. This way the
rport state machine doesn't know anything about special ports, it's
completely independent. It also moves the decision making about the
lport's state machine back to the LP.

b) I moved the NS registration functions back to fc_lport.c. This way
the lport is the only file/block that deals with changing the lport's
state. I'll cover this more below.

2.
I think that discovery/ns either needs its own state machine or the
LP states that it manages (REG_PN, REG_FT, SCR) need to moved back
to the fc_lport.c. This is an interesting proposition to move the NS
registration back to the lport because it really converts fc_ns.c
back to being fc_disc.c. With fc_disc.c only handling discovery.

When I knocked that idea around here it seemed to be well received.
However, another idea emerged where we would have a state machine
for name server registration such that the lport would just enter READY
after logging into the fabric and then start the NS state machine. I
guess this could tie into the event_callback() function as the NS
block could call the callback when discovery was complete, or maybe the
LP doesn't care really, if it's already READY then it probably doesn't
need a callback.

...

So, now on to the lport locking :). What I'm focused on with the lport
locking is the fear that a user will call issue_lip and reset us.
The reset code needs some work and Chris is looking into it, but
it should be deleting all of the rports, clearing out any existing
exchanges and 0-ing out any negotiated values. My main concern here is
the FID. I'm worried about a scenario where we're processing the
PLOGI response and we're going to send a PRLI request next. If the user
issues a reset and the lock is not held across the PLOGI response ->
PRLI request then the FID could be set to 0 right before sending the
PRLI request, so we'd spew a frame with FID 0, not good. This scenario
is not limited to PLOGI/PRLI it can happen with any response/request.

I think that the FCP code is fine because the FC transport class
grabs the Scsi_Host lock when deleting rports and therefore we shouldn't
get any new I/O.

The LP code itself isn't too bad either. The plan is to lock upon
the request of action. I know that sounds goofy, what I mean is we
need to lock if the user tells us to login/logout/reset or when we
get a response to an existing request or if we get a new request. So,
the algorithm would be.

a. action function is called
b. lock the lport
c. check the state to see if we want to process the action
d. process the action (keep in mind that an action could be a response)
e. change the state if necessary
f. send a new request if necessary
g. unlock the lport

This means that the locking is not very granular, but that's fine
because logging into the fabric shouldn't happen often and it's not
in the fast-path. 

So, that leaves the NS and RP code. Now, I'm not sure where the NS
registration code should go as mentioned above, but either way my
FID 0 concern is still valid so the NS and RP code would need to grab
"a" lock from the action point to the next request going out just like
the lport does. Now the choices here are that each block could use its
own lock. The rport already has a lock to protect its state. If we made
sure that the reset function locked the lport and then the rport we
could
ensure that the rport never sends a FID 0 frame. So, that leaves the NS
block. The NS block could either have its own lock which makes sense if
you give it its own registration state machine or it could use the
lport's lock. 

...

For me right now it's really boiling down to the question of where the
NS registration should go. If it stays in the fc_ns.c then I think the
NS should have its own state machine and possibly its own lock otherwise
I'd pull the registration into fc_lport.c, use the lport lock and
rename everything in fc_ns.c  from "ns_" to "disc_"  and rename the
file to fc_disc.c. My patches do the latter without the renaming.

//Rob
_______________________________________________
devel mailing list
[email protected]
http://www.open-fcoe.org/mailman/listinfo/devel

Reply via email to