On 9/4/25 20:07, Petr Machata wrote:
The bridge FDB contains one local entry per port per VLAN, for the MAC of
the port in question, and likewise for the bridge itself. This allows
bridge to locally receive and punt "up" any packets whose destination MAC
address matches that of one of the bridge interfaces or of the bridge
itself.
The number of these local "service" FDB entries grows linearly with number
of bridge-global VLAN memberships, but that in turn will tend to grow
quadratically with number of ports and per-port VLAN memberships. While
that does not cause issues during forwarding lookups, it does make dumps
impractically slow.
As an example, with 100 interfaces, each on 4K VLANs, a full dump of FDB
that just contains these 400K local entries, takes 6.5s. That's _without_
considering iproute2 formatting overhead, this is just how long it takes to
walk the FDB (repeatedly), serialize it into netlink messages, and parse
the messages back in userspace.
This is to illustrate that with growing number of ports and VLANs, the time
required to dump this repetitive information blows up. Arguably 4K VLANs
per interface is not a very realistic configuration, but then modern
switches can instead have several hundred interfaces, and we have fielded
requests for >1K VLAN memberships per port among customers.
[snip]
All this FDB duplication is there merely to make things snappy during
forwarding. But high-radix switches with thousands of VLANs typically do
not process much traffic in the SW datapath at all, but rather offload vast
majority of it. So we could exchange some of the runtime performance for a
neater FDB.
To that end, in this patchset, introduce a new bridge option,
BR_BOOLOPT_FDB_LOCAL_VLAN_0, which when enabled, has local FDB entries
installed only on VLAN 0, instead of duplicating them across all VLANs.
Then to maintain the local termination behavior, on FDB miss, the bridge
does a second lookup on VLAN 0.
Enabling this option changes the bridge behavior in expected ways. Since
the entries are only kept on VLAN 0, FDB get, flush and dump will not
perceive them on non-0 VLANs. And deleting the VLAN 0 entry affects
forwarding on all VLANs.
This patchset is loosely based on a privately circulated patch by Nikolay
Aleksandrov.
I knew this sounded familiar, I actually did try to upstream the original
patch[1] way back
in 2015 and was rejected, at the time that led to the vlan rhashtable code. :-)
By the way the original idea and change predate me and were by Wilson Kok, I
just polished
them and took over the patch while at Cumulus.
Now, this is presented in a much shinier new option manner with selftests which
is great.
I think we can take the new option this time around, it will be very helpful
for some
setups as explained.
The code looks good to me, I appreciate how well split it is.
For the series:
Acked-by: Nikolay Aleksandrov <[email protected]>
Thanks,
Nik
[1]
https://lore.kernel.org/netdev/[email protected]/
The patchset progresses as follows:
- Patch #1 introduces a bridge option to enable the above feature. Then
patches #2 to #5 gradually patch the bridge to do the right thing when
the option is enabled. Finally patch #6 adds the UAPI knob and the code
for when the feature is enabled or disabled.
- Patches #7, #8 and #9 contain fixes and improvements to selftest
libraries
- Patch #10 contains a new selftest
The corresponding iproute2 support is at:
https://github.com/pmachata/iproute2/commits/fdb_local_vlan_0/
Petr Machata (10):
net: bridge: Introduce BROPT_FDB_LOCAL_VLAN_0
net: bridge: BROPT_FDB_LOCAL_VLAN_0: Look up FDB on VLAN 0 on miss
net: bridge: BROPT_FDB_LOCAL_VLAN_0: On port changeaddr, skip per-VLAN
FDBs
net: bridge: BROPT_FDB_LOCAL_VLAN_0: On bridge changeaddr, skip
per-VLAN FDBs
net: bridge: BROPT_FDB_LOCAL_VLAN_0: Skip local FDBs on VLAN creation
net: bridge: Introduce UAPI for BR_BOOLOPT_FDB_LOCAL_VLAN_0
selftests: defer: Allow spaces in arguments of deferred commands
selftests: defer: Introduce DEFER_PAUSE_ON_FAIL
selftests: net: lib.sh: Don't defer failed commands
selftests: forwarding: Add test for BR_BOOLOPT_FDB_LOCAL_VLAN_0
include/uapi/linux/if_bridge.h | 3 +
net/bridge/br.c | 22 ++
net/bridge/br_fdb.c | 114 +++++-
net/bridge/br_input.c | 8 +
net/bridge/br_private.h | 3 +
net/bridge/br_vlan.c | 10 +-
.../testing/selftests/net/forwarding/Makefile | 1 +
.../net/forwarding/bridge_fdb_local_vlan_0.sh | 374 ++++++++++++++++++
tools/testing/selftests/net/lib.sh | 32 +-
tools/testing/selftests/net/lib/sh/defer.sh | 20 +-
10 files changed, 559 insertions(+), 28 deletions(-)
create mode 100755
tools/testing/selftests/net/forwarding/bridge_fdb_local_vlan_0.sh