Re: [Open-FCoE] [PATCH 1/4] libfc: Added BA_RJT handling in FCP

Mike Christie Wed, 27 Aug 2008 21:16:16 -0700

Dev, Vasu wrote:
>> -----Original Message-----
>> From: Mike Christie [mailto:[EMAIL PROTECTED]
>> Sent: Tuesday, August 26, 2008 6:54 PM
>> To: Dev, Vasu
>> Cc: [email protected]
>> Subject: Re: [Open-FCoE] [PATCH 1/4] libfc: Added BA_RJT handling in
> FCP
>> Mike Christie wrote:
>>> Vasu Dev wrote:
>>>> Currently fc_fcp_abts_resp() only does exch_done and cleanup
>>>> if BA_ACC received to abort req, so added code to do cleanup
>>>> on BA_RJT also, however not sure if scsi result should be
>>>> diffrent for BA_ACC v/s BA_RJT.
>>> Either way we need to retry the command because we dropped status and
>>> data. I thought I sent a patch for that 
>>> but it looks like it did not hit
>>> the list, so here it is attached. With that patch we will retry the
>>> command.
>>>
> 
> With your attached patch, the command will be retried after scsi eh for
> received BA_RJT to FCP issued abort in FCP pkt timeout, do we need to
> handle these BA_RJT before scsi eh? See more comments below along your


I might not understand the question. If fc_fcp_timeout ends up aborting 
the command and we get a reject and just drop it, when the scsi eh 
command timers fires and it eventually calls fc_seq_exch_abort then it 
will return -ENXIO because the ESB_ST_ABNORMAL bit is still set on it. 
fc_fcp_pkt_abort will then return FAILED and fc_eh_abort will return 
FAILED, and we eventually do a lun reset. If that succeeds, then the 
fc_fcp_pkt will get cleaned up. If that fails then the host reset will 
clean it up. So that is what I meant that the scsi-eh would escalate and 
would clean up, so I think I mean we do not have to handle the BA_RJT 
before the scsi eh runs.


> notes on this question.
> 
> I'll apply this attached patch.
>>>
>>>> Signed-off-by: Vasu Dev <[EMAIL PROTECTED]>
>>>> ---
>>>>
>>>>  drivers/scsi/libfc/fc_fcp.c |    2 +-
>>>>  1 files changed, 1 insertions(+), 1 deletions(-)
>>>>
>>>>
>>>> diff --git a/drivers/scsi/libfc/fc_fcp.c
> b/drivers/scsi/libfc/fc_fcp.c
>>>> index f2915ed..bfe077f 100644
>>>> --- a/drivers/scsi/libfc/fc_fcp.c
>>>> +++ b/drivers/scsi/libfc/fc_fcp.c
>>>> @@ -609,7 +609,7 @@ static void fc_fcp_abts_resp(struct fc_fcp_pkt
>>>> *fsp, struct fc_frame_header *fh)
>>>>       * we will let the command timeout and scsi-ml escalate if
>>>>       * the abort was rejected
>>>>       */
>>>> -    if (fh->fh_r_ctl == FC_RCTL_BA_ACC) {
>>>> +    if (fh->fh_r_ctl == FC_RCTL_BA_ACC || fh->fh_r_ctl ==
>>>> FC_RCTL_BA_RJT) {
>>>
>>> I remembered why I did this after I mentioned it on the list. If it
> is
>>> rejected do we know the status of the exchange we are aborting? If
> not
>>> then it seemed safest to let the scsi eh escalate the eh. For example
> if
>>> it is rejected because the target port is not able to exexute the
> abort
>>> then we do not want to complete the command here and possibly retry
> it
>>> before killing it on the target.
> 
> Additional error handling can be added for BA_RJT according to
> information in BA_RJT to be certain on exchange status but since target
> has responded to abort that means it should be safe to assume that
> target is likely doing fine though response is BA_RJT in some cases
> here. For instance if only scsi status is lost then very likely target
> will respond with BA_RJT with logical error since perhaps in that case
> exchange related resources will be freed in target as soon as target
> finished sending scsi status without any FCP_CONF request. In this case
> the target would not recognize exchange in abort request, in turn target
> will send BA_RJT in that case and though target is doing fine, so mostly
> it will be safe to complete command here and then retry command.

If you are saying that for some cases it is ok, I agree. I am fine with 
completing the cmd/fsp for certain BA_RJT return reasons, but I do not 
think we can just complete it on BA_RJT. I am also not sure about when 
it is safe and when it is not for all targets.


> 
>> I meant to also say we can probably kill/complete it here if we know
> the
>> scsi-eh is going to run and escalate the eh.
>>
> 
> I think completing here will have quick recovery since anyway scsi eh

Yeah, I am just saying if we have to add complexity to handle it it 
might not be worth it because at this point performance is hosed. If we 
were failing the IO with different errors codes to try and get it failed 
over for multipath quickly that might be different. But if we can simply 
check the reject response and reason and check what is going on with the 
command, it should be fine (not complex) and I agree with you that we 
should do it for certain safe cases.

> would do the same thing in current code by issuing another exch abort


The scsi-eh would not send another abort - it shouldn't as it is today. 
The ESB_ST_ABNORMAL bit is still set so we skip sending an abort and 
would send a lun reset.


> which would more likely get another BA_RJT due to larger scsi eh timeout
> since by then exchange resource at target will get freed. 
> 

_______________________________________________
devel mailing list
[email protected]
http://www.open-fcoe.org/mailman/listinfo/devel

Re: [Open-FCoE] [PATCH 1/4] libfc: Added BA_RJT handling in FCP

Reply via email to