Dev, Vasu wrote: >>> exch_done would then need to be modified so that instead of >>> fc_exch_release doing >>> >>> spin_lock_bh(&mp->em_lock); >>> WARN_ON(mp->total_exches <= 0); >>> mp->total_exches--; >>> mp->exches[ep->xid - mp->min_xid] = NULL; >>> list_del(&ep->ex_list); >>> spin_unlock_bh(&mp->em_lock); >>> >>> exch_done would do that, then release its hold. And actually exch_done >>> would need to check that if there were multiple ep->resp functions (one >>> from fc_fcp_send_cmd and one from the abort) then we do not do the mp >>> manipulation above until the exch_done has been called for the last >>> ep->resp. >>> > > I don't understand calling exch_done() twice from upper layer of fc_exch.c > and checking for multiple ep->resp functions to fix above described issue
Do you mean you do not understand how moving the clearing of the mp exches ref removes the need for the hack to add the extra check refcount check in the original patch? Or do you not understand what I meant by how this will actually be implemented? > along with extra exch ref for abort in fc_exch.c. How are you planning to > set these multiple exch resp functions and how that is going to change We do not need multiple functions. In my patch I was telling you about the response handler for the function that started the exchange will be able to handle normal completion of the IO and abort response, since it is the same object being worked on (same fsp in fc_fcp.c's case). The fc_fcp_recv and other completion paths in fc_fcp.c will be smart enough to know that if it has sent an abort and has got status for a command then it needs to wait for the abort response before calling exch_done. And actually are we not supposed to call the upper layers if there is an abort in progress in csae we get a recovery qualifier. But to release the ref from the abort it initiated it needs to call into fc_exch.c to drop the refcount. So fc_exch_init would set a default exch_put that drops the hold from a operation that does a get. > fc_exch.c interface? How are you going to check for last fc_exch_done() > to do mp cleanup ? > > I think, to fix this issue we should following things:- > > 1. Always call fc_exch_done() from upper layer, it can be either when > scsi command response is received or fc_exch.c abort/error handling is done. > > 2. Currently fc_exch.c doesn't call upper layer on completion of abort > handling and it completes exch implicitly by it self at end of abort, this > implicit exch completion at end of abort handling is not clean instead > fc_exch.c should be changed to call upper layer at end of abort, so that > upper layer can call fc_exch_done() to free the exchange. However fc_exch.c > abort handling can complete the exch if resp is null and that would be > for non fc_fcp.c upper layers and that is okay since in that case fc_exch.c > will issue the abort. > > 3. In think if we implement above two then additional exch ref for abort is > not required and fcp would ensure by it fsp state to call fc_exch_done() > only once to free the exch. However if you think for some corner case this > is still required then fc_exch.c should only increment and decrement that > exch ref for abort, increment on abort issue and decrement when abort > handling is done. > Taking a refcount is nice because it is consistent with other operations where we send a sequence or even another exhange that references some object. If we are saying we are going to sync evreything up in fc_exch.c like below then this is a special case so who cares about consistency for this case :) > 4. The fc_fcp_recv() can figure out ep->resp calling context (error/abort > v/s scsi resp) by looking into fc_frame pointer to make use of ex_resp_sync You mean define FC_EX_ABORTED_SUCCESSFUL and FC_EX_ABPORT_FAILED type of values and pass them into fc_fcp_recv() to indicate if we are calling it because the exchange is done becuase it was aborted right? > of fc_exch_done() to ensure no dangling ep->resp since ex_resp_sync can be I am not sure.... Will there be dandling responses in the abort path? There shouldn't be. That is what we are trying to fix. I think we are talking about the same thing except the completion, and I might like yours more. With what I proposed I was having fc_fcp.c sync up the normal exch completion with the abort response handling, which is bad because it has to be done in multiple places when we only need to do it once. I thought you are saying fc_exch.c is going to sync up the abort response and the other seq/exch processing that was in transit before we sent the abort or after too. This way we do not need refcounts on the ep. For the abort and normal completion paths if fc_exch.c is going to sync everything up, we should also not need to use the synchronized exch_done should we? Because fc_exch.c is going to make sure that the exchange is done either normally or from a abort being successful then once fc_exch.c calls the resp handler, there should not be anything reported to fc_exch.c after the abort or nrmal completion notification (fc_exch.c basically handles it so fc_fcp does not have to worry). We just need the syncronized exch_done with the other scsi tmf and other abornormal completion path races, right? > set only if calling context is error handling path as I described in patch 2 > description. I also suggested alternate approach of using callback to free > fsp pkt instead of ex_resp_sync with calling context restrictions. > What do you think between ex_resp_sync and call back approach of patch-2. I think the callback approach more becuase fc_fcp and other callers do not have to worry about fc_exch.c sleeping when calling into exch_done. > I was hoping for more comments on patch 2 since I was looking for > direction between ex_resp_sync and callback but some how patch-1 > created more interest despite mainly only timer cleanup in patch-1 :-). That is what happens when you mix a tiny hack with a clean patch and I am going over the same code :) > >>> There are more corner cases like with all fun error handler, but I guess >>> the main idea is that we would be moving the mp manipulation to >>> exch_done instead of doing it in the release function. And exch_done >>> would do the release after the mp manipulation, so we would not hit that >>> race. >>The abort has a hold on the ep which is released by the abort >>> caller when it calls exch_done. fc_fcp.c gets its abort responses and >>> the fc_timeout_error path does not leave dangling ep->resps. >>> > > Are you referring to using ex_resp_sync of patch 2 of this series > to ensure no more dangling ep->resp ? No. There would not be any in the abort paths with what I had proposed. I do not think I could have used the patch for that path because it could have slept in timer context. > >>> What do you think? >> Oh yeah, my comments on this patch can be ignored for the merge of the >> patch since it is almost al timer clean ups in here. I will send a patch >> over your two patches. > > Thanks for you're all comments and finding issues in error handling paths. > > I'll get these patches in today, if callback is preferred over > ex_resp_sync then it can be implemented on top of patch-2. > > Thanks > Vasu > _______________________________________________ devel mailing list [email protected] http://www.open-fcoe.org/mailman/listinfo/devel
