On 10/12/2011 2:30 PM, Vasu Dev wrote:
On Wed, 2011-10-12 at 01:43 -0700, Zou, Yi wrote:
On 8/9/2011 2:20 PM, Vasu Dev wrote:


I think the problem the patch is trying to fix is already illustraced
from the included trace. Basically it's a conflict of exch id due to the
fact that initiator side thinks the exch resources are free since the
fc_eh_host_reset() would result exch_mgr_reset, so I think any path that
ends fc_lport_reset() will likely to have this issue, fc_eh_host_reset()
or the other one I see is disabling fc_lport.

Yes fc_lport_reset() code path has issues here on exch reuse/conflict
and this is complicate with FIP as that doesn't have abort concept.
However I see problem with added msleep() also as Bhanu pointed
especially sleep with lock hold. The delay during reset should be
tolerable as reset are unlikely un-event but good to avoid if possible,

In case of bnx2fc this problem happens more often, as we do not issue FLOGI, until FCF is ready. We first call fcoe_ctlr_link_up() and then call fc_fabric_login. So, we always hit the new code patch and wait for 10 secs every time FCoE interface is created.


so let me look into different fix here or least do delay only if LOGO is
not completed w/o msleep.

I think that would be better to delay only if LOGO is not complete, and as far as possible we should avoid msleep() in regular code paths.

Thanks,
Bhanu





I can see the side effects you described here, particularly that it is
not nice to msleep() w/ multiple locks held. Since currently there is
no exch timeout value (not sure why?) for LOGO on rports, i.e., LOGOs
shown in this trace, and as Vasu mentioned no ABTS on FIP, sending about
along the exch reset path is a solution to guarantee the exch on both
ends are in synced up state.

Bottom line is that we have to know the exch is reusable to the other side
before reissuing the flogi, I guess we really need timeout on rport LOGOs,
not just fabric one, and we have to wait for completion of LOGO ACC/RJT
or the timeout before continuing.

Yeah that would work and timeout other than msleep() is required here
only if not completed.

Thanks
Vasu






_______________________________________________
devel mailing list
[email protected]
https://lists.open-fcoe.org/mailman/listinfo/devel

Reply via email to