I was starting to have this discussion with Mike, but he suggested moving it to the mailing list, so here it is. I'm going to mail my patch set following this mail too. I'm keeping the last few patches that do the lport locking to myself because they're not working right now. The rest of the patches may not be complete, but they're functional. Let me know if you have any opinions. All of the following patches are just an RFC, they will not be applied yet.
... Before getting to the locking I think that there are two issues that need to be resolved. 1) The lport state machine needs to be isolated to fc_lport.c 2) Where should name server registration go, fc_lport.c or fc_ns.c? 1. Currently there are two problems in this area. I don't like that fc_ns.c and fc_rport.c both change the state of the lport. fc_ns.c moves the lport through ST_REG_FT, ST_REG_SCR and ST_READY as well as ST_DNS_STOP. When an rport becomes READY. fc_rport.c always checks to see if that port is the lp->dns_rp and upon that determination it will take some action, possibly calling back into the lport. It bothers me that both the NS and RP code are making decisions for the LP. Locally I have two changes to solve these problems. a) I created an event_callback() function in the tt (libfc_transport_template) that the rport code calls upon READY or error for an rport going through its state machine. This way the rport state machine doesn't know anything about special ports, it's completely independent. It also moves the decision making about the lport's state machine back to the LP. b) I moved the NS registration functions back to fc_lport.c. This way the lport is the only file/block that deals with changing the lport's state. I'll cover this more below. 2. I think that discovery/ns either needs its own state machine or the LP states that it manages (REG_PN, REG_FT, SCR) need to moved back to the fc_lport.c. This is an interesting proposition to move the NS registration back to the lport because it really converts fc_ns.c back to being fc_disc.c. With fc_disc.c only handling discovery. When I knocked that idea around here it seemed to be well received. However, another idea emerged where we would have a state machine for name server registration such that the lport would just enter READY after logging into the fabric and then start the NS state machine. I guess this could tie into the event_callback() function as the NS block could call the callback when discovery was complete, or maybe the LP doesn't care really, if it's already READY then it probably doesn't need a callback. ... So, now on to the lport locking :). What I'm focused on with the lport locking is the fear that a user will call issue_lip and reset us. The reset code needs some work and Chris is looking into it, but it should be deleting all of the rports, clearing out any existing exchanges and 0-ing out any negotiated values. My main concern here is the FID. I'm worried about a scenario where we're processing the PLOGI response and we're going to send a PRLI request next. If the user issues a reset and the lock is not held across the PLOGI response -> PRLI request then the FID could be set to 0 right before sending the PRLI request, so we'd spew a frame with FID 0, not good. This scenario is not limited to PLOGI/PRLI it can happen with any response/request. I think that the FCP code is fine because the FC transport class grabs the Scsi_Host lock when deleting rports and therefore we shouldn't get any new I/O. The LP code itself isn't too bad either. The plan is to lock upon the request of action. I know that sounds goofy, what I mean is we need to lock if the user tells us to login/logout/reset or when we get a response to an existing request or if we get a new request. So, the algorithm would be. a. action function is called b. lock the lport c. check the state to see if we want to process the action d. process the action (keep in mind that an action could be a response) e. change the state if necessary f. send a new request if necessary g. unlock the lport This means that the locking is not very granular, but that's fine because logging into the fabric shouldn't happen often and it's not in the fast-path. So, that leaves the NS and RP code. Now, I'm not sure where the NS registration code should go as mentioned above, but either way my FID 0 concern is still valid so the NS and RP code would need to grab "a" lock from the action point to the next request going out just like the lport does. Now the choices here are that each block could use its own lock. The rport already has a lock to protect its state. If we made sure that the reset function locked the lport and then the rport we could ensure that the rport never sends a FID 0 frame. So, that leaves the NS block. The NS block could either have its own lock which makes sense if you give it its own registration state machine or it could use the lport's lock. ... For me right now it's really boiling down to the question of where the NS registration should go. If it stays in the fc_ns.c then I think the NS should have its own state machine and possibly its own lock otherwise I'd pull the registration into fc_lport.c, use the lport lock and rename everything in fc_ns.c from "ns_" to "disc_" and rename the file to fc_disc.c. My patches do the latter without the renaming. //Rob _______________________________________________ devel mailing list [email protected] http://www.open-fcoe.org/mailman/listinfo/devel
