On Thu, Apr 30, 2026 at 9:27 AM Ido Schimmel <[email protected]> wrote:
>
> Connecting two bridges on the same system [1] can result in a lockdep
> splat [2].
>
> The report is a false positive. Multicast queries are built and
> transmitted under the bridge multicast lock. When the outgoing port of
> one bridge is configured on top of another bridge, the transmit path
> re-enters bridge code and acquires the other bridge's multicast lock in
> order to snoop the query. Both lock instances share a single lockdep
> class, so lockdep flags the nested acquisition as an AA deadlock.
>
> Giving each bridge its own lock class will not solve the problem: the
> reverse topology would produce an ABBA splat with the same pair of
> classes. It also consumes a lockdep key per bridge.
>
> Instead, fix the problem by deferring the transmission of the queries to
> a workqueue. Build the skb and update querier state under the lock as
> before, then enqueue the skb on a per multicast context queue and
> schedule the work.
>
> Flush the work when the multicast context is de-initialized. At this
> stage the work cannot be requeued. There is no need to take a reference
> on skb->dev since the work cannot outlive the bridge or the bridge port.
>
> Use the high priority workqueue to reduce the delay between the enqueue
> time and the transmission time. With default settings (i.e., querier
> interval - 255 seconds, query interval - 125 seconds) the extra delay
> should not be a problem.
>
> Avoid the unlikely case of the queue growing endlessly by limiting it to
> 1,000 skbs. Use this number for the simple reason that this is the
> default Tx queue length.
>
> [1]
> ip link add name br1 up type bridge mcast_snooping 1 mcast_querier 1
> ip link add name br0 up type bridge mcast_snooping 1 mcast_querier 1
> ip link add link br0 name br0.10 up master br1 type vlan id 10
>
> [2]
> WARNING: possible recursive locking detected
> 7.0.0-virtme-gb50c64a58a90 #1 Not tainted
> [...]
> ip/339 is trying to acquire lock:
> ffff888104f0b480 (&br->multicast_lock){+.-.}-{3:3}, at: 
> br_ip6_multicast_query (net/bridge/br_multicast.c:3584)
>
> but task is already holding lock:
> ffff888104f03480 (&br->multicast_lock){+.-.}-{3:3}, at: 
> br_multicast_port_query_expired (net/bridge/br_multicast.c:1904)
>
> [...]
>
> Call Trace:
> [...]
> br_ip6_multicast_query (net/bridge/br_multicast.c:3584)
> br_multicast_ipv6_rcv (net/bridge/br_multicast.c:3988)
> br_dev_xmit (net/bridge/br_device.c:98 (discriminator 1))
> dev_hard_start_xmit (./include/linux/netdevice.h:5343 
> ./include/linux/netdevice.h:5352 net/core/dev.c:3888 net/core/dev.c:3904)
> __dev_queue_xmit (./include/linux/netdevice.h:3619 net/core/dev.c:4871)
> vlan_dev_hard_start_xmit (net/8021q/vlan_dev.c:131 (discriminator 1))
> dev_hard_start_xmit (./include/linux/netdevice.h:5343 
> ./include/linux/netdevice.h:5352 net/core/dev.c:3888 net/core/dev.c:3904)
> __dev_queue_xmit (./include/linux/netdevice.h:3619 net/core/dev.c:4871)
> br_dev_queue_push_xmit (net/bridge/br_forward.c:60)
> __br_multicast_send_query (net/bridge/br_multicast.c:1811 (discriminator 1))
> br_multicast_send_query (net/bridge/br_multicast.c:1889)
> br_multicast_port_query_expired (./include/linux/spinlock.h:390 
> net/bridge/br_multicast.c:1914)
> call_timer_fn (./arch/x86/include/asm/jump_label.h:37 
> ./include/trace/events/timer.h:127 kernel/time/timer.c:1749)
> [...]
>
> Fixes: eb1d16414339 ("bridge: Add core IGMP snooping support")
> Reported-by: [email protected]
> Closes: 
> https://lore.kernel.org/netdev/[email protected]/
> Acked-by: Nikolay Aleksandrov <[email protected]>
> Reviewed-by: Petr Machata <[email protected]>
> Signed-off-by: Ido Schimmel <[email protected]>
> ---
> v2:
>   - Limit the queue to 1,000 skbs.
>   - Edit the trace to avoid checkpatch errors.
> v1: https://lore.kernel.org/netdev/[email protected]/
> ---
>  net/bridge/br_multicast.c | 47 +++++++++++++++++++++++++++++++++++----
>  net/bridge/br_private.h   |  4 ++++
>  2 files changed, 47 insertions(+), 4 deletions(-)
>
> diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
> index 881d866d687a..e9f5fe01ff95 100644
> --- a/net/bridge/br_multicast.c
> +++ b/net/bridge/br_multicast.c
> @@ -1776,6 +1776,30 @@ static void br_multicast_select_own_querier(struct 
> net_bridge_mcast *brmctx,
>  #endif
>  }
>
> +static void br_multicast_port_query_queue_work(struct work_struct *work)
> +{
> +       struct net_bridge_mcast_port *pmctx;
> +       struct sk_buff *skb;
> +
> +       pmctx = container_of(work, struct net_bridge_mcast_port,
> +                            query_queue_work);
> +       while ((skb = skb_dequeue(&pmctx->query_queue)))
> +               NF_HOOK(NFPROTO_BRIDGE, NF_BR_LOCAL_OUT, dev_net(skb->dev),
> +                       NULL, skb, NULL, skb->dev, br_dev_queue_push_xmit);
> +}
> +
> +static void br_multicast_query_queue_work(struct work_struct *work)
> +{
> +       struct net_bridge_mcast *brmctx;
> +       struct sk_buff *skb;
> +
> +       brmctx = container_of(work, struct net_bridge_mcast, 
> query_queue_work);
> +       while ((skb = skb_dequeue(&brmctx->query_queue)))
> +               netif_rx(skb);
> +}

These two functions could loop forever under flood.

Perhaps use skb_queue_splice_init() to limit each round to the ~1,000
limit you have mentioned in the changelog.

And return after the spliced list has been processed.

Bonus: no more spinlock acquisition for each skb_dequeue()

Reply via email to