>-----Original Message-----
>From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
>Behalf Of [EMAIL PROTECTED]
>Sent: Thursday, August 14, 2008 8:40 PM
>To: [email protected]
>Subject: [Open-FCoE] some fc exch reset fixes and a fix to my abort patches
>
>These patches were made over my last abort patches and over the
>current tree.
>
>They fix a couple bugs in fc_exch.c and fc_fcp.c. Most are old bugs,
>but one was caused by my abort changes. The fix for the bug I added
>is also needed for the old bugs I am fixing, so I am including it in here
>so I do not fill up the list by resending the old patchset. If you want
>me to resend the old patches with this fix merged in it let me know
>and I will do that instead.
>
>
This is fine and no need to resend patches in one merged set, I was able
to apply these patches after resolving very minor conflict due to recent
renaming of fc_fcp_recv().
I know I'm not responding to correct email/s thread for individual more
points below in this email but given so many of your floating patches,
I guess aggregate reply to all patches will expedite your patches
for inclusion to tree, therefore I'm writing all my points in this single
response.
I took your recent 10 patches series and followed by 4 more patches up to
patch title as "libfc: check ESB_ST_COMPLETE before completing ep (take 2)"
and then applied these 14 patches to commit.
57929ea0ac20b23f894a557f2b0842dc1ec7d804 Robert Love 5 days ago title:fcoe:
Move protocol definitions fc_fcoe.h
These 14 patches works fine for basic testing :-) however not with
kicking error handling paths, more details below.
I tried a new error test to kick error handling paths with and without
your these 14 patches by dropping SCSI response to a SCSI read command.
With your 14 patches. My system locks up persistently after
single error test run, I mean system becomes non response without any
additional error messages though my kernel has spin lock debugging enabled.
So I tried this error test without your 14 patches, this time I ran
into exch cleanup issue which is likely due to messed up exch ref
in RRQ which you have fixed in patch 7/10. However this time multiple
error test runs works fine without causing any system lock up.
However wireshark trace looks good with or without your patches through REC,
REC RJT, ABTS, BA_ACC, RRQ and RRQ_ACC with my target when error test is
done.
So any idea why system locks up with your 14 patches?
I think some more issues from these 14 patches or I failed to
understand your all changes in these 14 patches. So please help
me understand mainly changes on fc_exch.c and fcp abort handling.
1. Earlier fc_exch_abts_resp() would have completed exch for all upper
layers, that means fc_exch_timeout() would have send RRQ for sure under
ESB_ST_COMPLETE bit check.
if (e_stat & ESB_ST_COMPLETE) {
ep->esb_stat = e_stat & ~(ESB_ST_REC_QUAL | ESB_ST_COMPLETE);
spin_unlock_bh(&ep->ex_lock);
if (e_stat & ESB_ST_REC_QUAL)
fc_exch_rrq(ep);
}
Now for fcp, exchange may not be completed and therefore check on
ESB_ST_COMPLETE bit check will prevent sending RRQ in that case and
exchange will not get freed since fc_exch_done() will abandon exch
free due to ESB_ST_REC_QUAL bit set in exch_done() code path.
I also don't understand resetting ESB_ST_COMPLETE bit in fc_exch_timeout().
2. Now abort handling is broken for non fc_fcp.c if no response to abort
after removed ESB_ST_ABNORMAL state handling in fc_exch_timeout. Earlier
abort was contained within fc_exch.c for all upper layers of fc_exch.c and
aborted exchange was done either on BA_ACC or ESB_ST_ABNORMAL handling in
fc_exch_timeout.
3. The fc_fcp_recv() is getting called for abort issued from initiator but
not called for incoming abort though this is unlikely in initiator but yet
possible. I don't understand spec well on this case but I guess should
be same as sending abort and then drop all frames while abort finishes. You
recently checked spec on this, so did you left not calling fcp in this
case on purpose as per spec or it is in your TODO ?
4. The -FC_EX_TIMEOUT handling in fc_fcp_error() doesn't make sense and I don't
think force completion will ever occur in modified fc_fcp_error()
since from fc_exch.c the force completion will be issued -FC_EX_TIMEOUT
and that is ignored in fc_fcp_error().
5. The RRQ response drop exch ref for ACC but not if REJ.
I think these 14 patches can be applied now but before that it would be good
to understand at least more stability issues with these 14 patches.
--Vasu
_______________________________________________
devel mailing list
[email protected]
http://www.open-fcoe.org/mailman/listinfo/devel