Vasu Dev wrote:
> On Mon, 2009-10-12 at 11:39 -0700, Joe Eykholt wrote:
>> This is with the patch I just submitted on top of fcoe-next.
>> I believe it's unrelated to my patch.
>>
>> On removing the module or deleting an instance (I'm not sure which), the
>> system crashes.  I think the problem is that fc_exch_release() happens
>> too late, after the exchange manager is freed, so that the second arg
>> to mempool_free, the pool pointer, is 6b6b6b6b6b6b6b6b, which is the
>> slab allocator's free poison value.
>>
>> Symptoms with other allocators will probably be different, but I find the
>> slab allocator with CONFIG_DEBUG_SLAB handy for finding things like this.
>>
>> We need to do a sync cancel somewhere before removing the exchange manager.
> 
> The fc_exch_reset needs to call cancel_delayed_work_sync() instead
> cancel_delayed_work, we could not call _sync here since the
> fc_exch_reset is called with lport lock held and calling _sync would
> have acquired lport lock again causing deadlock. Currently lport exch
> resp handler checks for -FC_EX_CLOSED without acquiring lport lock, so
> now it should be safe to call cancel_delayed_work_sync in
> fc_exch_reset(), let me try this fix.

The lock checking code still will not like it.
The checker would see that the work item (response handler)
grabs the lp_mutex in some cases for FC_EX_TIMEOUT, so it wouldn't
like us holding lp_mutex during a cancel_sync.

That's a real problem since the timeout handler may be trying to get
the lock and we would wait for the handler to finish.

Another way might be to use a separate work queue for exchange timeouts,
and flush all work on that queue synchronously before freeing an
exchange manager.

<omitting log>

        Regards,
        Joe
_______________________________________________
devel mailing list
[email protected]
http://www.open-fcoe.org/mailman/listinfo/devel

Reply via email to