On 3/16/25 9:28 PM, Kuniyuki Iwashima wrote:
> SIOCBRDELIF is passed to dev_ioctl() first and later forwarded to
> br_ioctl_call(), which causes unnecessary RTNL dance and the splat
> below [0] under RTNL pressure.
>
> Let's say Thread A is trying to detach a device from a bridge and
> Thread B is trying to remove the bridge.
>
> In dev_ioctl(), Thread A bumps the bridge device's refcnt by
> netdev_hold() and releases RTNL because the following br_ioctl_call()
> also re-acquires RTNL.
>
> In the race window, Thread B could acquire RTNL and try to remove
> the bridge device. Then, rtnl_unlock() by Thread B will release RTNL
> and wait for netdev_put() by Thread A.
>
> Thread A, however, must hold RTNL after the unlock in dev_ifsioc(),
> which may take long under RTNL pressure, resulting in the splat by
> Thread B.
>
> Thread A (SIOCBRDELIF) Thread B (SIOCBRDELBR)
> ---------------------- ----------------------
> sock_ioctl sock_ioctl
> `- sock_do_ioctl `- br_ioctl_call
> `- dev_ioctl `- br_ioctl_stub
> |- rtnl_lock |
> |- dev_ifsioc '
> ' |- dev = __dev_get_by_name(...)
> |- netdev_hold(dev, ...) .
> / |- rtnl_unlock ------. |
> | |- br_ioctl_call `---> |- rtnl_lock
> Race | | `- br_ioctl_stub |- br_del_bridge
> Window | | | |- dev = __dev_get_by_name(...)
> | | | May take long | `- br_dev_delete(dev, ...)
> | | | under RTNL pressure | `-
> unregister_netdevice_queue(dev, ...)
> | | | | `- rtnl_unlock
> \ | |- rtnl_lock <-' `- netdev_run_todo
> | |- ... `- netdev_run_todo
> | `- rtnl_unlock |- __rtnl_unlock
> | |- netdev_wait_allrefs_any
> |- netdev_put(dev, ...) <----------------'
> Wait refcnt decrement
> and log splat below
>
> To avoid blocking SIOCBRDELBR unnecessarily, let's not call
> dev_ioctl() for SIOCBRADDIF and SIOCBRDELIF.
>
> In the dev_ioctl() path, we do the following:
>
> 1. Copy struct ifreq by get_user_ifreq in sock_do_ioctl()
> 2. Check CAP_NET_ADMIN in dev_ioctl()
> 3. Call dev_load() in dev_ioctl()
> 4. Fetch the master dev from ifr.ifr_name in dev_ifsioc()
>
> 3. can be done by request_module() in br_ioctl_call(), so we move
> 1., 2., and 4. to br_ioctl_stub().
>
> Note that 2. is also checked later in add_del_if(), but it's better
> performed before RTNL.
>
> SIOCBRADDIF and SIOCBRDELIF have been processed in dev_ioctl() since
> the pre-git era, and there seems to be no specific reason to process
> them there.
>
> [0]:
> unregister_netdevice: waiting for wpan3 to become free. Usage count = 2
> ref_tracker: wpan3@ffff8880662d8608 has 1/1 users at
> __netdev_tracker_alloc include/linux/netdevice.h:4282 [inline]
> netdev_hold include/linux/netdevice.h:4311 [inline]
> dev_ifsioc+0xc6a/0x1160 net/core/dev_ioctl.c:624
> dev_ioctl+0x255/0x10c0 net/core/dev_ioctl.c:826
> sock_do_ioctl+0x1ca/0x260 net/socket.c:1213
> sock_ioctl+0x23a/0x6c0 net/socket.c:1318
> vfs_ioctl fs/ioctl.c:51 [inline]
> __do_sys_ioctl fs/ioctl.c:906 [inline]
> __se_sys_ioctl fs/ioctl.c:892 [inline]
> __x64_sys_ioctl+0x1a4/0x210 fs/ioctl.c:892
> do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> do_syscall_64+0xcb/0x250 arch/x86/entry/common.c:83
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
>
> Fixes: 893b19587534 ("net: bridge: fix ioctl locking")
> Reported-by: syzkaller <[email protected]>
> Reported-by: yan kang <[email protected]>
> Reported-by: yue sun <[email protected]>
> Closes:
> https://lore.kernel.org/netdev/sy8p300mb0421225d54eb92762ae8f0f2a1...@sy8p300mb0421.ausp300.prod.outlook.com/
> Signed-off-by: Kuniyuki Iwashima <[email protected]>
> Acked-by: Stanislav Fomichev <[email protected]>
> ---
> v2:
> * Use if in br_ioctl_stub()
> * Update diagram in commit message
>
> v1: https://lore.kernel.org/netdev/[email protected]/
> ---
> include/linux/if_bridge.h | 6 ++----
> net/bridge/br_ioctl.c | 36 +++++++++++++++++++++++++++++++++---
> net/bridge/br_private.h | 3 +--
> net/core/dev_ioctl.c | 19 -------------------
> net/socket.c | 19 +++++++++----------
> 5 files changed, 45 insertions(+), 38 deletions(-)
>
Thanks,
Acked-by: Nikolay Aleksandrov <[email protected]>