On Thu, 2009-08-06 at 16:08 -0700, Joe Eykholt wrote:
> 
> I also don't see it happening very often, it sometimes goes 300
> create/deletes.
> I have seen it on exchange IDs other than 0.  Actually only 0 and 5 so
> far,
> it takes a long time to happen.
> 
> I think it's more rare now that we have per-cpu pools, for some
> reason.

I could see why this issue is less likely with per cpu patches than
older code with EM lock. 

I think issue here is that failed or dropped flogi exch is in
fc_exch_timeout handler while fc_exch_reset has already finished on that
exch due to i/f destroy. In this case either flogi fc_exch_timeout is
either scheduled or in the middle of processing. 

Then the i/f destroy code path moves on and hits to fc_exch_mgr_destroy
first while flogi fc_exch_timeout is still in progress and its exch ref
is pending with value 1. This final ref will be released at end of
fc_exch_timeout when this function finds that this exch has been reset
(FC_EX_DONE | FC_EX_RST_CLEANUP bits set).

This is the race here and this must be the issue here. 

The per cpu fc_exch_mgr_reset() code path goes thru all EM pools with
added additional indirections due to fc_exch_mgr_anchor and that might
give some additional time to fc_exch_timeout release its final exch ref
before fc_exch_mgr_destroy gets called. Therefore this race might be
less likely with per cpu patches but anyway same race still exist.
 
See more details below on fixing this race.

> I ported your patch to my tree and am running with it.
> I don't think we should have to wait for exchanges to be released.
> 

This will fix identified race but with additional tracking for pending
exch. I know this is not good fix.

May be another fix could be to use cancel_delayed_work_sync in
fc_exch_reset but that would need circular locking issues need to be
fixed first since currently sometimes fc_exch_mgr_reset() gets called
with lport lock held. I'm looking into this fix more.

        Vasu

> Thanks for trying it out.  I'll keep running and trying to provoke it
> again.
> 
>         Joe

_______________________________________________
devel mailing list
[email protected]
http://www.open-fcoe.org/mailman/listinfo/devel

Reply via email to