On Wed, 2018-09-19 at 08:53 -0700, Ben Greear wrote:
> Hello,
>
> I see this lockdep splat on a modified 4.16.18+ kernel when the ath10k
> firmware crashes early.
>
> I am having a hard time figuring out how to go about fixing this, and would
> welcome
> some suggestions.
Not really sure how to fix it - it basically means that "ath10k_wq"
contains code that acquires the RTNL:
> -> #2 (rtnl_mutex){+.+.}:
> Sep 19 08:38:51 lf0313-6477 kernel: wiphy_register+0x1120/0x1f90
> [cfg80211]
> Sep 19 08:38:51 lf0313-6477 kernel:
> ieee80211_register_hw+0x114e/0x2d20 [mac80211]
> Sep 19 08:38:51 lf0313-6477 kernel: ath10k_mac_register+0x1b2f/0x2ff0
> [ath10k_core]
> Sep 19 08:38:51 lf0313-6477 kernel:
> ath10k_core_register_work+0x2365/0x30e0 [ath10k_core]
> Sep 19 08:38:51 lf0313-6477 kernel: process_one_work+0x5f7/0x14d0
> Sep 19 08:38:51 lf0313-6477 kernel: worker_thread+0xdc/0x12d0
> Sep 19 08:38:51 lf0313-6477 kernel: kthread+0x2cf/0x3c0
> Sep 19 08:38:51 lf0313-6477 kernel: ret_from_fork+0x24/0x30
but something on the workqueue is also flushed while holding rtnl.
The solution might be as simple as making it not be an ordered/single-
threaded workqueue (which can spawn extra threads if needed), but I
don't know how it's used.
Then again, ath10k_stop() only calls cancel_work_sync() and
cancel_delayed_work_sync() ... which I think means you're running into
the lockdep annotation bug I fixed recently!
See upstream commits
87915adc3f0ac ("workqueue: re-add lockdep dependencies for flushing")
d6e89786bed97 ("workqueue: skip lockdep wq dependency in cancel_work_sync()")
johannes