From: Thadeu Lima de Souza Cascardo <[email protected]>
Date: Tue, 28 Feb 2012 17:34:38 -0300

> On Tue, Feb 28, 2012 at 02:30:51PM -0500, David Miller wrote:
>> From: Thadeu Lima de Souza Cascardo <[email protected]>
>> Date: Tue, 28 Feb 2012 15:36:16 -0300
>> 
>> > When a EEH happens, the catas poll code will try to restart the device,
>> > removing it and adding it back again. The EEH code will try to do the
>> > same. One of the threads ends up accessing memory that was freed by the
>> > other thread and we get a crash.
>> 
>> Stop adding bandaids to the locking.
>> 
>> If the EEH infrastructure doesn't synchronize parallel operations
>> on the same device, that is the real bug, and that's where the real
>> fix belongs.
>> 
>> I refuse to apply this patch.
>> 
> 
> It's not EEH that does not synchronize removal. The problem is that the
> driver itself calls the driver remove function through mlx4_restart_one.

Then reuse the existing intf_mutex this driver has, export it to
main.c and add a new __mlx4_unregister_device that can be called
with the intf_mutex held already.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to