Dev, Vasu wrote:
>> -----Original Message-----
>> From: Mike Christie [mailto:[EMAIL PROTECTED]
>> Sent: Tuesday, August 26, 2008 6:54 PM
>> To: Dev, Vasu
>> Cc: [email protected]
>> Subject: Re: [Open-FCoE] [PATCH 1/4] libfc: Added BA_RJT handling in
> FCP
>> Mike Christie wrote:
>>> Vasu Dev wrote:
>>>> Currently fc_fcp_abts_resp() only does exch_done and cleanup
>>>> if BA_ACC received to abort req, so added code to do cleanup
>>>> on BA_RJT also, however not sure if scsi result should be
>>>> diffrent for BA_ACC v/s BA_RJT.
>>> Either way we need to retry the command because we dropped status and
>>> data. I thought I sent a patch for that
>>> but it looks like it did not hit
>>> the list, so here it is attached. With that patch we will retry the
>>> command.
>>>
>
> With your attached patch, the command will be retried after scsi eh for
> received BA_RJT to FCP issued abort in FCP pkt timeout, do we need to
> handle these BA_RJT before scsi eh? See more comments below along your
I might not understand the question. If fc_fcp_timeout ends up aborting
the command and we get a reject and just drop it, when the scsi eh
command timers fires and it eventually calls fc_seq_exch_abort then it
will return -ENXIO because the ESB_ST_ABNORMAL bit is still set on it.
fc_fcp_pkt_abort will then return FAILED and fc_eh_abort will return
FAILED, and we eventually do a lun reset. If that succeeds, then the
fc_fcp_pkt will get cleaned up. If that fails then the host reset will
clean it up. So that is what I meant that the scsi-eh would escalate and
would clean up, so I think I mean we do not have to handle the BA_RJT
before the scsi eh runs.
> notes on this question.
>
> I'll apply this attached patch.
>>>
>>>> Signed-off-by: Vasu Dev <[EMAIL PROTECTED]>
>>>> ---
>>>>
>>>> drivers/scsi/libfc/fc_fcp.c | 2 +-
>>>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>>>
>>>>
>>>> diff --git a/drivers/scsi/libfc/fc_fcp.c
> b/drivers/scsi/libfc/fc_fcp.c
>>>> index f2915ed..bfe077f 100644
>>>> --- a/drivers/scsi/libfc/fc_fcp.c
>>>> +++ b/drivers/scsi/libfc/fc_fcp.c
>>>> @@ -609,7 +609,7 @@ static void fc_fcp_abts_resp(struct fc_fcp_pkt
>>>> *fsp, struct fc_frame_header *fh)
>>>> * we will let the command timeout and scsi-ml escalate if
>>>> * the abort was rejected
>>>> */
>>>> - if (fh->fh_r_ctl == FC_RCTL_BA_ACC) {
>>>> + if (fh->fh_r_ctl == FC_RCTL_BA_ACC || fh->fh_r_ctl ==
>>>> FC_RCTL_BA_RJT) {
>>>
>>> I remembered why I did this after I mentioned it on the list. If it
> is
>>> rejected do we know the status of the exchange we are aborting? If
> not
>>> then it seemed safest to let the scsi eh escalate the eh. For example
> if
>>> it is rejected because the target port is not able to exexute the
> abort
>>> then we do not want to complete the command here and possibly retry
> it
>>> before killing it on the target.
>
> Additional error handling can be added for BA_RJT according to
> information in BA_RJT to be certain on exchange status but since target
> has responded to abort that means it should be safe to assume that
> target is likely doing fine though response is BA_RJT in some cases
> here. For instance if only scsi status is lost then very likely target
> will respond with BA_RJT with logical error since perhaps in that case
> exchange related resources will be freed in target as soon as target
> finished sending scsi status without any FCP_CONF request. In this case
> the target would not recognize exchange in abort request, in turn target
> will send BA_RJT in that case and though target is doing fine, so mostly
> it will be safe to complete command here and then retry command.
If you are saying that for some cases it is ok, I agree. I am fine with
completing the cmd/fsp for certain BA_RJT return reasons, but I do not
think we can just complete it on BA_RJT. I am also not sure about when
it is safe and when it is not for all targets.
>
>> I meant to also say we can probably kill/complete it here if we know
> the
>> scsi-eh is going to run and escalate the eh.
>>
>
> I think completing here will have quick recovery since anyway scsi eh
Yeah, I am just saying if we have to add complexity to handle it it
might not be worth it because at this point performance is hosed. If we
were failing the IO with different errors codes to try and get it failed
over for multipath quickly that might be different. But if we can simply
check the reject response and reason and check what is going on with the
command, it should be fine (not complex) and I agree with you that we
should do it for certain safe cases.
> would do the same thing in current code by issuing another exch abort
The scsi-eh would not send another abort - it shouldn't as it is today.
The ESB_ST_ABNORMAL bit is still set so we skip sending an abort and
would send a lun reset.
> which would more likely get another BA_RJT due to larger scsi eh timeout
> since by then exchange resource at target will get freed.
>
_______________________________________________
devel mailing list
[email protected]
http://www.open-fcoe.org/mailman/listinfo/devel