On 05/30, Joe Damato wrote: > Previously, e1000_down called cancel_work_sync for the e1000 reset task > (via e1000_down_and_stop), which takes RTNL. > > As reported by users and syzbot, a deadlock is possible due to lock > inversion in the following scenario: > > CPU 0: > - RTNL is held > - e1000_close > - e1000_down > - cancel_work_sync (takes the work queue mutex) > - e1000_reset_task > > CPU 1: > - process_one_work (takes the work queue mutex) > - e1000_reset_task (takes RTNL)
nit: as Jakub mentioned in another thread, it seems more about the flush_work waiting for the reset_task to complete rather than wq mutexes (which are fake)? CPU 0: - RTNL is held - e1000_close - e1000_down - cancel_work_sync - __flush_work - <wait here for the reset_task to finish> CPU 1: - process_one_work - e1000_reset_task (takes RTNL) - <but cpu 0 already holds rtnl> The fix looks good! Acked-by: Stanislav Fomichev <[email protected]>
