On 4/26/26 3:34 PM, Ido Schimmel wrote:
> Connecting two bridges on the same system [1] can result in a lockdep
> splat [2].
> 
> The report is a false positive. Multicast queries are built and
> transmitted under the bridge multicast lock. When the outgoing port of
> one bridge is configured on top of another bridge, the transmit path
> re-enters bridge code and acquires the other bridge's multicast lock in
> order to snoop the query. Both lock instances share a single lockdep
> class, so lockdep flags the nested acquisition as an AA deadlock.
> 
> Giving each bridge its own lock class will not solve the problem: the
> reverse topology would produce an ABBA splat with the same pair of
> classes. It also consumes a lockdep key per bridge.
> 
> Instead, fix the problem by deferring the transmission of the queries to
> a workqueue. Build the skb and update querier state under the lock as
> before, then enqueue the skb on a per multicast context queue and
> schedule the work.

I must admit that introducing an additional WQ to fix a false positive
feels a bit overkill to me - even if I can't think of a better solution
on top of my head.

> Flush the work when the multicast context is de-initialized. At this
> stage the work cannot be requeued. There is no need to take a reference
> on skb->dev since the work cannot outlive the bridge or the bridge port.
> 
> Use the high priority workqueue to reduce the delay between the enqueue
> time and the transmission time. With default settings (i.e., querier
> interval - 255 seconds, query interval - 125 seconds) the extra delay
> should not be a problem.
> 
> [1]
> ip link add name br1 up type bridge mcast_snooping 1 mcast_querier 1
> ip link add name br0 up type bridge mcast_snooping 1 mcast_querier 1
> ip link add link br0 name br0.10 up master br1 type vlan id 10
> 
> [2]
> ============================================
> WARNING: possible recursive locking detected
> 7.0.0-virtme-gb50c64a58a90 #1 Not tainted
> --------------------------------------------

checkpatch reports that the above separator may break tool. Possibly
just remove it from the commit message.

> ip/339 is trying to acquire lock:
> ffff888104f0b480 (&br->multicast_lock){+.-.}-{3:3}, at: 
> br_ip6_multicast_query (net/bridge/br_multicast.c:3584)
> 
> but task is already holding lock:
> ffff888104f03480 (&br->multicast_lock){+.-.}-{3:3}, at: 
> br_multicast_port_query_expired (net/bridge/br_multicast.c:1904)
> 
> [...]
> 
> Call Trace:
> [...]
> br_ip6_multicast_query (net/bridge/br_multicast.c:3584)
> br_multicast_ipv6_rcv (net/bridge/br_multicast.c:3988)
> br_dev_xmit (net/bridge/br_device.c:98 (discriminator 1))
> dev_hard_start_xmit (./include/linux/netdevice.h:5343 
> ./include/linux/netdevice.h:5352 net/core/dev.c:3888 net/core/dev.c:3904)
> __dev_queue_xmit (./include/linux/netdevice.h:3619 net/core/dev.c:4871)
> vlan_dev_hard_start_xmit (net/8021q/vlan_dev.c:131 (discriminator 1))
> dev_hard_start_xmit (./include/linux/netdevice.h:5343 
> ./include/linux/netdevice.h:5352 net/core/dev.c:3888 net/core/dev.c:3904)
> __dev_queue_xmit (./include/linux/netdevice.h:3619 net/core/dev.c:4871)
> br_dev_queue_push_xmit (net/bridge/br_forward.c:60)
> __br_multicast_send_query (net/bridge/br_multicast.c:1811 (discriminator 1))
> br_multicast_send_query (net/bridge/br_multicast.c:1889)
> br_multicast_port_query_expired (./include/linux/spinlock.h:390 
> net/bridge/br_multicast.c:1914)
> call_timer_fn (./arch/x86/include/asm/jump_label.h:37 
> ./include/trace/events/timer.h:127 kernel/time/timer.c:1749)
> [...]
> 
> Fixes: eb1d16414339 ("bridge: Add core IGMP snooping support")
> Reported-by: [email protected]
> Closes: 
> https://lore.kernel.org/netdev/[email protected]/
> Acked-by: Nikolay Aleksandrov <[email protected]>
> Signed-off-by: Ido Schimmel <[email protected]>
> ---
>  net/bridge/br_multicast.c | 39 +++++++++++++++++++++++++++++++++++----
>  net/bridge/br_private.h   |  4 ++++
>  2 files changed, 39 insertions(+), 4 deletions(-)
> 
> diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
> index 881d866d687a..252c46977ed5 100644
> --- a/net/bridge/br_multicast.c
> +++ b/net/bridge/br_multicast.c
> @@ -1776,6 +1776,28 @@ static void br_multicast_select_own_querier(struct 
> net_bridge_mcast *brmctx,
>  #endif
>  }
>  
> +static void br_multicast_port_query_queue_work(struct work_struct *work)
> +{
> +     struct net_bridge_mcast_port *pmctx;
> +     struct sk_buff *skb;
> +
> +     pmctx = container_of(work, struct net_bridge_mcast_port,
> +                          query_queue_work);
> +     while ((skb = skb_dequeue(&pmctx->query_queue)))
> +             NF_HOOK(NFPROTO_BRIDGE, NF_BR_LOCAL_OUT, dev_net(skb->dev),
> +                     NULL, skb, NULL, skb->dev, br_dev_queue_push_xmit);
> +}
> +
> +static void br_multicast_query_queue_work(struct work_struct *work)
> +{
> +     struct net_bridge_mcast *brmctx;
> +     struct sk_buff *skb;
> +
> +     brmctx = container_of(work, struct net_bridge_mcast, query_queue_work);
> +     while ((skb = skb_dequeue(&brmctx->query_queue)))
> +             netif_rx(skb);
> +}
> +
>  static void __br_multicast_send_query(struct net_bridge_mcast *brmctx,
>                                     struct net_bridge_mcast_port *pmctx,
>                                     struct net_bridge_port_group *pg,
> @@ -1804,9 +1826,8 @@ static void __br_multicast_send_query(struct 
> net_bridge_mcast *brmctx,
>               skb->dev = pmctx->port->dev;
>               br_multicast_count(brmctx->br, pmctx->port, skb, igmp_type,
>                                  BR_MCAST_DIR_TX);
> -             NF_HOOK(NFPROTO_BRIDGE, NF_BR_LOCAL_OUT,
> -                     dev_net(pmctx->port->dev), NULL, skb, NULL, skb->dev,
> -                     br_dev_queue_push_xmit);
> +             skb_queue_tail(&pmctx->query_queue, skb);
> +             queue_work(system_highpri_wq, &pmctx->query_queue_work);

Also the AI reported concerns vs unbounded queue len looks relevant.
Usually the RX path is slower than TX, but i.e. asymmetric filtering
rules could reverse the scenario.

/P


Reply via email to