>-----Original Message-----
>From: Mike Christie [mailto:[EMAIL PROTECTED]
>Sent: Tuesday, August 12, 2008 12:45 PM
>To: Dev, Vasu
>Cc: [email protected]
>Subject: Re: [Open-FCoE] [RFC PATCH 1/2] libfc: removed checks for
>timer_pending and then updating exch ref count.
>
>Dev, Vasu wrote:
>>>> exch_done would then need to be modified so that instead of
>>>> fc_exch_release doing
>>>>
>>>>                  spin_lock_bh(&mp->em_lock);
>>>>                  WARN_ON(mp->total_exches <= 0);
>>>>                  mp->total_exches--;
>>>>                  mp->exches[ep->xid - mp->min_xid] = NULL;
>>>>                  list_del(&ep->ex_list);
>>>>                  spin_unlock_bh(&mp->em_lock);
>>>>
>>>> exch_done would do that, then release its hold. And actually exch_done
>>>> would need to check that if there were multiple ep->resp functions (one
>>>> from fc_fcp_send_cmd and one from the abort)
>>>> then we do not do the mp
>>>> manipulation above until the exch_done has been called for the last
>>>> ep->resp.
>>>>
>>
>> I don't understand calling exch_done() twice from upper layer of
>fc_exch.c
>> and checking for multiple ep->resp functions to fix above described issue
>
>Do you mean you do not understand how moving the clearing of the mp
>exches ref removes the need for the hack to add the extra check refcount
>check in the original patch?

I thought you were talking about issue of fcp exch abort and extra exch done
issue and so called hack is different thing. The hack is for receiving any
frame when we are about free exch, that can be from jammer or mal formed
corrupt frame. So I think two issues are mixed here.

I gathered that you were talking about this issue with fcp and abort :-

1. fcp issued the abort but abort response pending and no extra exch ref for 
abort.
2. scsi response status comes back it is about to call exch done
3. Abort respone comes that cleanup exchange.
4. Now exch done from 2 will release the same exchange which is already release 
by 3. So this will cause extra exch ref decrement.

So how so called hack is going to prevent this issue?

You were talking about "multiple ep->resp functions" and one of them from
abort but currently fc_exch.c doesn't call upper layer for abort respone.
If you meant to add calling upper layer for abort response as
I also mentioned in my point 2 below then I misunderstood your notes.

>
>Or do you not understand what I meant by how this will actually be
>implemented?
>
>
>> along with extra exch ref for abort in fc_exch.c. How are you planning to
>> set these multiple exch resp functions and how that is going to change
>
>We do not need multiple functions. In my patch I was telling you about

I guess that patch will level set us on solution since now this lengthy
discussion is creating more confusion. I'll wait for that patch to
understand your solution.

>the response handler for the function that started the exchange will be
>able to handle normal completion of the IO and abort response, since it
>is the same object being worked on (same fsp in fc_fcp.c's case). The
>fc_fcp_recv and other completion paths in fc_fcp.c will be smart enough
>to know that if it has sent an abort and has got status for a command
>then it needs to wait for the abort response before calling exch_done.
>And actually are we not supposed to call the upper layers if there is an
>abort in progress in csae we get a recovery qualifier.
>

The fc_exch.c does recovery qualifier processing but we should call
fc_fcp.c for abort response to sync exch done issue describe above.
I also suggested in point 2 below to add call to upper layer for
abort response as I said before also.

You also brought other issues to give true scsi smd status to scsi-ml
based on abort completion, so it make sense to call fcp for
abort response. In other non-fcp upper layers, the fc_exch.c
abort handling should be sufficient.

Again, I'll wait for your patch since we are level set on issue
but sure if we are level set on solution.

>But to release the ref from the abort it initiated it needs to call into
>fc_exch.c to drop the refcount. So fc_exch_init would set a default
>exch_put that drops the hold from a operation that does a get.
>
>
>> fc_exch.c interface? How are you going to check for last fc_exch_done()
>> to do mp cleanup ?
>>
>> I think, to fix this issue we should following things:-
>>
>> 1. Always call fc_exch_done() from upper layer, it can be either when
>> scsi command response is received or fc_exch.c abort/error handling is
>done.
>>
>> 2. Currently fc_exch.c doesn't call upper layer on completion of abort
>> handling and it completes exch implicitly by it self at end of abort,
>this
>> implicit exch completion at end of abort handling is not clean instead
>> fc_exch.c should be changed to call upper layer at end of abort, so that
>> upper layer can call fc_exch_done() to free the exchange. However
>fc_exch.c
>> abort handling can complete the exch if resp is null and that would be
>> for non fc_fcp.c upper layers and that is okay since in that case
>fc_exch.c
>> will issue the abort.
>>
>> 3. In think if we implement above two then additional exch ref for abort
>is
>> not required and fcp would ensure by it fsp state to call fc_exch_done()
>> only once to free the exch. However if you think for some corner case
>this
>> is still required then fc_exch.c should only increment and decrement that
>> exch ref for abort, increment on abort issue and decrement when abort
>> handling is done.
>>
>
>Taking a refcount is nice because it is consistent with other operations
>where we send a sequence or even another exhange that references some
>object. If we are saying we are going to sync evreything up in fc_exch.c
>like below then this is a special case so who cares about consistency
>for this case :)
>
>
>> 4. The fc_fcp_recv() can figure out ep->resp calling context (error/abort
>> v/s scsi resp) by looking into fc_frame pointer to make use of
>ex_resp_sync
>
>You mean define FC_EX_ABORTED_SUCCESSFUL and FC_EX_ABPORT_FAILED type of
>values and pass them into fc_fcp_recv() to indicate if we are calling it
>because the exchange is done becuase it was aborted right?
>

No, fc_fcp_recv() would need to look into fc_frame contents to see if it
is a FCP frame or abort response or an error code via fc_frame ptr.

>
>> of fc_exch_done() to ensure no dangling ep->resp since ex_resp_sync can
>be
>
>I am not sure.... Will there be dandling responses in the abort path?
>There shouldn't be. That is what we are trying to fix.
>
>I think we are talking about the same thing except the completion, and I
>might like yours more.
>
>With what I proposed I was having fc_fcp.c sync up the normal exch
>completion with the abort response handling, which is bad because it has
>to be done in multiple places when we only need to do it once.
>

I was also telling same thing in point 1 to always have one fc_exch_done()
from fc_fcp.c and that would require sync between abort and IO completion in 
fc_fcp.c.

>I thought you are saying fc_exch.c is going to sync up the abort
>response and the other seq/exch processing that was in transit before we
>sent the abort or after too.

Not sure what sync is required to send abort and where I said that? However
I did mention in point 2 calling fc_fcp.c for abort response, that can
be abort ACC. RJT or timeout error (no response from other end).

> This way we do not need refcounts on the
>ep. For the abort and normal completion paths if fc_exch.c is going to
>sync everything up, we should also not need to use the synchronized
>exch_done should we?

I said in point 2 to pass abort completion to fc_fcp.c and in point 1 to
have only single fc_exch_done() from fc_fcp.c. This would require
 fc_fcp.c to sync exch_done() since I was thinking abort or normal IO will
have same exch ref, therefore fc_fcp.c will issue only one fc_exch_done().
Therefore no need to have extra exch ref for abort.

> Because fc_exch.c is going to make sure that the
>exchange is done either normally or from a abort being successful then
>once fc_exch.c calls the resp handler, there should not be anything
>reported to fc_exch.c after the abort or nrmal completion notification
>(fc_exch.c basically handles it so fc_fcp does not have to worry).
>
>We just need the syncronized exch_done with the other scsi tmf and other
>abornormal completion path races, right?
>
>

No, read my point 2 again.  I said fc_fcp.c will decide when exch is
done and I suggested to remove this implicit exch completion for fc_fcp.c
in point 2 above.

>> set only if calling context is error handling path as I described in
>patch 2
>> description. I also suggested alternate approach of using callback to
>free
>> fsp pkt instead of ex_resp_sync with calling context restrictions.
>> What do you think between ex_resp_sync and call back approach of patch-2.
>
>
>I think the callback approach more becuase fc_fcp and other callers do
>not have to worry about fc_exch.c sleeping when calling into exch_done.
>

I think you missed some thing after more ? more bad or more good ?
I guess you liked call back approach.

>
>> I was hoping for more comments on patch 2 since I was looking for
>> direction between ex_resp_sync and callback but some how patch-1
>> created more interest despite mainly only timer cleanup in patch-1 :-).
>
>That is what happens when you mix a tiny hack with a clean patch and I
>am going over the same code :)
>
>>
>>>> There are more corner cases like with all fun error handler, but I
>guess
>>>> the main idea is that we would be moving the mp manipulation to
>>>> exch_done instead of doing it in the release function. And exch_done
>>>> would do the release after the mp manipulation, so we would not hit
>that
>>>> race. >>The abort has a hold on the ep which is released by the abort
>>>> caller when it calls exch_done. fc_fcp.c gets its abort responses and
>>>> the fc_timeout_error path does not leave dangling ep->resps.
>>>>
>>
>> Are you referring to using ex_resp_sync of patch 2 of this series
>> to ensure no more dangling ep->resp ?
>
>No. There would not be any in the abort paths with what I had proposed.
>I do not think I could have used the patch for that path because it
>could have slept in timer context.

OK, I'll drop second patch then.

        --Vasu
_______________________________________________
devel mailing list
[email protected]
http://www.open-fcoe.org/mailman/listinfo/devel

Reply via email to