On Thu, Apr 30, 2026 at 9:27 AM Ido Schimmel <[email protected]> wrote: > > Connecting two bridges on the same system [1] can result in a lockdep > splat [2]. > > The report is a false positive. Multicast queries are built and > transmitted under the bridge multicast lock. When the outgoing port of > one bridge is configured on top of another bridge, the transmit path > re-enters bridge code and acquires the other bridge's multicast lock in > order to snoop the query. Both lock instances share a single lockdep > class, so lockdep flags the nested acquisition as an AA deadlock. > > Giving each bridge its own lock class will not solve the problem: the > reverse topology would produce an ABBA splat with the same pair of > classes. It also consumes a lockdep key per bridge. > > Instead, fix the problem by deferring the transmission of the queries to > a workqueue. Build the skb and update querier state under the lock as > before, then enqueue the skb on a per multicast context queue and > schedule the work. > > Flush the work when the multicast context is de-initialized. At this > stage the work cannot be requeued. There is no need to take a reference > on skb->dev since the work cannot outlive the bridge or the bridge port. > > Use the high priority workqueue to reduce the delay between the enqueue > time and the transmission time. With default settings (i.e., querier > interval - 255 seconds, query interval - 125 seconds) the extra delay > should not be a problem. > > Avoid the unlikely case of the queue growing endlessly by limiting it to > 1,000 skbs. Use this number for the simple reason that this is the > default Tx queue length. > > [1] > ip link add name br1 up type bridge mcast_snooping 1 mcast_querier 1 > ip link add name br0 up type bridge mcast_snooping 1 mcast_querier 1 > ip link add link br0 name br0.10 up master br1 type vlan id 10 > > [2] > WARNING: possible recursive locking detected > 7.0.0-virtme-gb50c64a58a90 #1 Not tainted > [...] > ip/339 is trying to acquire lock: > ffff888104f0b480 (&br->multicast_lock){+.-.}-{3:3}, at: > br_ip6_multicast_query (net/bridge/br_multicast.c:3584) > > but task is already holding lock: > ffff888104f03480 (&br->multicast_lock){+.-.}-{3:3}, at: > br_multicast_port_query_expired (net/bridge/br_multicast.c:1904) > > [...] > > Call Trace: > [...] > br_ip6_multicast_query (net/bridge/br_multicast.c:3584) > br_multicast_ipv6_rcv (net/bridge/br_multicast.c:3988) > br_dev_xmit (net/bridge/br_device.c:98 (discriminator 1)) > dev_hard_start_xmit (./include/linux/netdevice.h:5343 > ./include/linux/netdevice.h:5352 net/core/dev.c:3888 net/core/dev.c:3904) > __dev_queue_xmit (./include/linux/netdevice.h:3619 net/core/dev.c:4871) > vlan_dev_hard_start_xmit (net/8021q/vlan_dev.c:131 (discriminator 1)) > dev_hard_start_xmit (./include/linux/netdevice.h:5343 > ./include/linux/netdevice.h:5352 net/core/dev.c:3888 net/core/dev.c:3904) > __dev_queue_xmit (./include/linux/netdevice.h:3619 net/core/dev.c:4871) > br_dev_queue_push_xmit (net/bridge/br_forward.c:60) > __br_multicast_send_query (net/bridge/br_multicast.c:1811 (discriminator 1)) > br_multicast_send_query (net/bridge/br_multicast.c:1889) > br_multicast_port_query_expired (./include/linux/spinlock.h:390 > net/bridge/br_multicast.c:1914) > call_timer_fn (./arch/x86/include/asm/jump_label.h:37 > ./include/trace/events/timer.h:127 kernel/time/timer.c:1749) > [...] > > Fixes: eb1d16414339 ("bridge: Add core IGMP snooping support") > Reported-by: [email protected] > Closes: > https://lore.kernel.org/netdev/[email protected]/ > Acked-by: Nikolay Aleksandrov <[email protected]> > Reviewed-by: Petr Machata <[email protected]> > Signed-off-by: Ido Schimmel <[email protected]> > --- > v2: > - Limit the queue to 1,000 skbs. > - Edit the trace to avoid checkpatch errors. > v1: https://lore.kernel.org/netdev/[email protected]/ > --- > net/bridge/br_multicast.c | 47 +++++++++++++++++++++++++++++++++++---- > net/bridge/br_private.h | 4 ++++ > 2 files changed, 47 insertions(+), 4 deletions(-) > > diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c > index 881d866d687a..e9f5fe01ff95 100644 > --- a/net/bridge/br_multicast.c > +++ b/net/bridge/br_multicast.c > @@ -1776,6 +1776,30 @@ static void br_multicast_select_own_querier(struct > net_bridge_mcast *brmctx, > #endif > } > > +static void br_multicast_port_query_queue_work(struct work_struct *work) > +{ > + struct net_bridge_mcast_port *pmctx; > + struct sk_buff *skb; > + > + pmctx = container_of(work, struct net_bridge_mcast_port, > + query_queue_work); > + while ((skb = skb_dequeue(&pmctx->query_queue))) > + NF_HOOK(NFPROTO_BRIDGE, NF_BR_LOCAL_OUT, dev_net(skb->dev), > + NULL, skb, NULL, skb->dev, br_dev_queue_push_xmit); > +} > + > +static void br_multicast_query_queue_work(struct work_struct *work) > +{ > + struct net_bridge_mcast *brmctx; > + struct sk_buff *skb; > + > + brmctx = container_of(work, struct net_bridge_mcast, > query_queue_work); > + while ((skb = skb_dequeue(&brmctx->query_queue))) > + netif_rx(skb); > +}
These two functions could loop forever under flood. Perhaps use skb_queue_splice_init() to limit each round to the ~1,000 limit you have mentioned in the changelog. And return after the spliced list has been processed. Bonus: no more spinlock acquisition for each skb_dequeue()
