Hoi,
Thanks for your time Ondrej, and apologies Maria for mistyping your
name, Mrs IPng Networks is called Marina so that kind of just rolls off
the keyboard sometimes :)
On 19.02.2026 18:04, Ondrej Zajicek wrote:
On Wed, Feb 18, 2026 at 09:59:05PM +0100, Pim van Pelt wrote:
Hoi,
Thanks for taking a look, Marina and Ondrej, I appreciate it!
On 18.02.2026 17:50, Ondrej Zajicek wrote:
As others noted, the relevant branch is 'oz-evpn', the older 'evpn'
branch fell victim to my needlesly strict adherence to "do not rebase
public branch" rule. The patches in 'oz-evpn' are not only rebased on
newer BIRD version, but also have fixes squashed in them, and there is
newer development. I just pushed there rebase to 2.18. Please look at
this branch first. Also note there are some minor changes to EVPN protocol
configuration syntax.
I have ported by vppevpn protocol implementation to be based on oz-evpn, and
the system is functional here also. Yaay!
I only had one small issue. In oz-evpn, the 'evpn' protocol will stay in
'startup' until the vxlan0 interface becomes ready. However, in my usecase,
vxlan is not performed by the kernel, but by VPP, so there is no 'vxlan0'
interface. I need only 'vni' and 'router address' (and the remote VTEP) to
construct the dataplane configuration. To allow the evpn protocol to
transition to PS_UP, I decided to fire an event that announces the IMET if
router_addr and VNI are set, and skips waiting for the interface.
Hmm, you have NULL interface in the encap->tunnel_dev? Or some fake interface
created by if_get_by_name()? Or some dummy/irrelevant interface (loopback)?
I do specify an 'encapsulation vxlan { tunnel device "vxlan0";};'. It
satisfies Bird2 by having an interface, it just doesn't exist in the
kernel. In branch 'evpn' this was fine, in branch 'oz-evpn' this needs
me to cheat a bit because we're waiting on the device to be oper-up and
enslaved to the bridge. If I skip that part, everything works fine
without any kernel interaction. See below in [1] for my cheat.
The interface is here not just to get/check router_addr and VNI, but
primarily to construct next hops for routes in bridge table:
evpn_receive_mac() / evpn_receive_imet():
.nh.iface = encap->tunnel_dev,
These are necessary not just for kernel dataplane (to specify tunnel
implemnting iface), but also formally just to have non-NULL nh.iface,
which we generally assumed in BIRD for RTD_UNICAST nexthops. So how
these routes looks in your setup?
Once I convince bird to not wait for the encap->tunnel_dev oper-up and
its bridge master, the 'evpn' protocol starts, and next hop looks quite
normal.
From 'evpntab':
evpn imet 8298:200 0 192.168.10.2 [vpp0_2 2026-02-19 from
2001:678:d78:200::2] * (100) [i]
Type: BGP univ
BGP.origin: IGP
BGP.as_path:
BGP.next_hop: 192.168.10.2
BGP.local_pref: 100
BGP.ext_community: (rt, 8298, 20040) (generic, 0x30c0000, 0x8)
BGP.pmsi_tunnel: ingress-replication 192.168.10.2 mpls 20040
evpn mac 8298:200 0 fe:54:00:f0:11:23 * unicast [vpp0_2 2026-02-19 from
2001:678:d78:200::2] * (100/5) [i]
via 192.168.10.10 on e0 mpls 20040
Type: BGP univ
BGP.origin: IGP
BGP.as_path:
BGP.next_hop: 192.168.10.2
BGP.local_pref: 100
BGP.ext_community: (rt, 8298, 20040) (generic, 0x30c0000, 0x8)
BGP.mpls_label_stack: 20040
Equivalent routes from 'etab':
00:00:00:00:00:00 vlan 200 mpls 20040 unicast [evpn2 2026-02-19] * (80)
via 192.168.10.2 on vxlan0 mpls 20040
Type: EVPN univ
mpls_label: 20040
fe:54:00:f0:11:23 vlan 200 mpls 20040 unicast [evpn2 2026-02-19] * (80)
via 192.168.10.2 on vxlan0 mpls 20040
Type: EVPN univ
mpls_label: 20040
Note that the nexthops of VXLAN-tunneled routes in bridge table are just
makeshift now, esp. usage of nh.gw for encap-dst-ip and nh->label[0]
encap-vni, these should get their own attributes (once we will redesign
nexthops to have proper attributes).
The information I needed for my usecase, is nexthop '192.168.10.2', and
mpls_label '20040' from etab, and IMET from evpntab (because in P2MP
there will be multiple IMETs and etab will only carry one of them). I've
implemented also 'vid', as you see above 200, but it carries no meaning
for VPP because the bridge-domain can be separately configured to allow
untagged, single-tagged or double-tagged in the PE interfaces. If new
attributes (like the vxlan nexthop or vxlan vni you suggest below) were
to appear, it will be easy for me to switch to using them instead.
I am often uncertain how much BIRD representation of routes should match
Linux API representation of routes (esp. for idiosyncratic details like
here when Linux API assumes nominal tunnel interfaces in next hop
interfaces for lightweight tunnels), but i usually defer to try to keep
it consistent to limit impedance mismatch here. But it may cause
problems when other backends with different conventions are used, like
in your case.
I think assuming by default a linux 'bridge' with its tunneling
functionality is perfectly fine, although I'd prefer it if it does not
become the /only/ valid way:
1) I'm not sure if that works well on other platforms (eg FreeBSD,
Windows, MacOS)
2) or embedded platforms (eg Broadcom or Marvell chips).
3) or VPP :-)
Requiring a linux bridge, and requiring a kernel interface, prohibits
non-linux eVPN scenarios. May I suggest that these things are kept
optional even if they are the default, but that they can be turned off,
for example by configuring a dummy interface dummy0, setting a config
toggle 'nowait' to skip waiting for it to be oper-up/enslaved, and that
we also do not require 'bridge' protocol ?
Btw, i planned to explicitly configure bridge device for EVPN protocol
(as it is now implicitly through tunnel_dev->master). The idea is that as
VRF device (in Linux) defines L3 VRF, bridge device defines MAC-VRF. And
as L3 protocols are associated with specific L3 VRF, L2 protocols should
be associated with specific MAC-VRF.
It would be good if 'evpn' protocol can continue to be used standalone,
in particular not conflate with 'bridge'. In my view, one should be able
to inspect evpntab and etab to construct other integrations without the
need to consult kernel devices. At the moment, 'evpn' entirely so and
less so 'oz-evpn' are elegant precisely because it does complete
signalling and captures evpntab and etab using exclusively one 'evpn'
and 'bgp' protocol together with the 'evpn table' and 'eth table'. It
allows me to create a custom 'vppevpn' protocol that subscribes to those
tables. See attached config file (bird-example.conf) for an idea of
where I'm headed.
Do you have (kernel-level) bridge
device in your setup? (i do not mean using BIRD bridge protocol).
VPP does not use any kernel bridge or vxlan device, it entirely operates
as a userspace dataplane. In my case, Bird directly programs the VPP
dataplane, the main flow of a four-router eVPN mesh looks like this,
imagine each of these log lines is the result of an API call to VPP
directly over unix domain socket:
Feb 19 22:25:50 vpp0-3 bird[1214613]: Enabling protocol bd200
Feb 19 22:25:50 vpp0-3 bird[1214613]: bd200: created bridge-domain
bd=200 with tag='bird_bd200'
Feb 19 22:25:50 vpp0-3 bird[1214613]: bd200: created vxlan-tunnel
sw_if_index=12 src=[192.168.10.3] dst=[192.168.10.0] vni=20040
Feb 19 22:25:50 vpp0-3 bird[1214613]: bd200: added sw_if_index=12 to
bd=200 shg=1
Feb 19 22:25:50 vpp0-3 bird[1214613]: bd200: added remote
mac=52:54:00:f0:10:10 vid=200 to bd=200 via vtep=[192.168.10.0]
vni=20040 sw_if_index=12
Feb 19 22:25:50 vpp0-3 bird[1214613]: bd200: created vxlan-tunnel
sw_if_index=16 src=[192.168.10.3] dst=[192.168.10.1] vni=20040
Feb 19 22:25:50 vpp0-3 bird[1214613]: bd200: added sw_if_index=16 to
bd=200 shg=1
Feb 19 22:25:50 vpp0-3 bird[1214613]: bd200: added remote
mac=52:54:00:f0:10:11 vid=200 to bd=200 via vtep=[192.168.10.1]
vni=20040 sw_if_index=16
Feb 19 22:25:50 vpp0-3 bird[1214613]: bd200: created vxlan-tunnel
sw_if_index=11 src=[192.168.10.3] dst=[192.168.10.2] vni=20040
Feb 19 22:25:50 vpp0-3 bird[1214613]: bd200: added sw_if_index=11 to
bd=200 shg=1
Feb 19 22:25:50 vpp0-3 bird[1214613]: bd200: added remote
mac=52:54:00:f0:10:12 vid=200 to bd=200 via vtep=[192.168.10.2]
vni=20040 sw_if_index=11
Feb 19 22:25:50 vpp0-3 bird[1214613]: bd200: added remote
mac=fe:54:00:f0:11:23 vid=200 to bd=200 via vtep=[192.168.10.2]
vni=20040 sw_if_index=11
Feb 19 22:25:50 vpp0-3 bird[1214613]: bd200: added remote
mac=fe:54:00:f0:11:03 vid=200 to bd=200 via vtep=[192.168.10.0]
vni=20040 sw_if_index=12
Feb 19 22:25:50 vpp0-3 bird[1214613]: bd200: added remote
mac=00:00:02:00:00:01 vid=200 to bd=200 via vtep=[192.168.10.0]
vni=20040 sw_if_index=12
Feb 19 22:25:50 vpp0-3 bird[1214613]: bd200: added remote
mac=fe:54:00:f0:11:13 vid=200 to bd=200 via vtep=[192.168.10.1]
vni=20040 sw_if_index=16
Feb 19 22:25:50 vpp0-3 bird[1214613]: bd200: added imet bd=200 vid=200
vtep=[192.168.10.2] vni=20040
Feb 19 22:25:50 vpp0-3 bird[1214613]: bd200: added imet bd=200 vid=200
vtep=[192.168.10.1] vni=20040
Feb 19 22:25:50 vpp0-3 bird[1214613]: bd200: added imet bd=200 vid=200
vtep=[192.168.10.0] vni=20040
Feb 19 22:25:51 vpp0-3 bird[1214613]: bd200: learned
mac=52:54:00:f0:10:13 vid=200 on bd=200
Feb 19 22:25:51 vpp0-3 bird[1214613]: bd200: learned
mac=fe:54:00:f0:11:33 vid=200 on bd=200
I am happy to share the 'vppevpn' protocol with others also, as an
example '3P integration'. I do not expect it to be upstreamed into
Bird2, unless there are community requests for it.
Ondrej, do let me know if you'd like to take a sneak peak at my code
(it's in a private repo for now, as it's not ready for wider review yet,
but it is mostly functional).
(3) Setting BGP Next Hop clears MPLS Labelstack, filters cannot set this.
When the BGP Next Hop is changed by an export filter, we lose the MPLS
labelstack. There is no way to add MPLS labelstack in filters (at least,
that I could find), so we cannot use 'next hop address X' to determine the
Type-2 MAC VxLAN endpoint. Note: IMET updates do not use the BGP Next Hop,
but rather a PSMI attribute with the 'router address' already.
Resetting MPLS label when changing next hop is intentional, as MPLS labels are
(in general) specific to receiving routers.
There is gw_mpls (and undocumented/semantically broken gw_mpls_stack)
attribute that could be accessed in filters.
I am not sure what is your use case here to change it with filters, can
you describe it more? What about setting 'router address' in EVPN proto?
With the oz-evpn branch as-is, setting 'router address' in evpn proto will:
1) copy that to the PSMI attribute: good
2) not do anything for MAC announcements; they will have BGP.next_hop set to
the session address.
if the previous patch in (2) is accepted, then 'router address' will be used
as BGP.next_hop, which will avoid the need to change it with filters with
(3).
Oh, i see. You are right, this should work automatically for both IMET / PMSI
and MAC.
I do not like using regular/immediate next hops here in EVPN table, as
it does not fit well semantically and requires formal device. But seems
to me that a reasonable alternative would be to just attach BGP_NEXT_HOP
by EVPN protocol, similarly how BGP_PMSI_TUNNEL is attached. Wil do that.
Any comments?
If you were to attach a specific attribute like vxlan_nexthop or
vxlan_vni to the etab table entry, I would simply read that and use it
instead of the bgp nexthop. That's what happens already today for IMET,
as it has the BGP.pmsi_tunnel attribute with the needed
ingress-replication 2001:678:d78:200::2 mpls 10040 information. How do
other vendors (say Arista, Cisco, Nokia, FRRouting) handle the Type-2
nexthop? My understanding is they use BGP next hop for that (in other
words, the same as how Bird does it today).
Note that immediate next hops in EVPN table for routes received through
BGP are here just as an artefact of BGP_NEXT_HOP resolvability check,
they should not be here too.
Not sure I understand what you mean - don't we have this problem also
for kernel based vxlan? If we create a vxlan0 interface in a bridge, and
set a fdb entry onto it, we also need to know which vxlan nexthop to
use. The way I read 'evpn' and 'oz-evpn', we use the BGP nexthop for
that purpose. However, if what you're saying is you'd want to remove the
BGP Next Hop and instead have an EVPN VxLAN Next Hop attribute to
populate the 'etab' gateway field that would work just as well for me. I
kind of wonder why you'd go to the trouble obfuscating the BGP Next Hop.
Don't other vendors use the same thing (send vxlan packet to the address
learned via the BGP Next Hop in Type-2 announcements) ?
If neither patch is applied, the following config:
protocol evpn {
...
encapsulation vxlan { router address 192.0.2.1; };
}
protocol bgp {
evpn { import all; export all; };
local 2001:db8::1 as 65512;
neighbor 2001:db8::2 as 65512;
}
will yield IMET pointing at 192.0.2.1 but MAC pointing at 2001:db8::1. If I
want MAC pointing at 192.0.2.1 also, I would either need (2, my preference)
or a filter with (3).
If there exists a device out there which has different addressing for IMET
and MAC (note: I don't know of any, but perhaps they exist), then (3) would
come in handy.
While i agree that it should work automatically by just setting router
address in protocol evpn, i think that this setup that should work even
without patches:
protocol evpn {
...
encapsulation vxlan { router address 192.0.2.1; };
}
protocol bgp {
evpn { import all; export all; next hop address 192.0.2.1; };
local 2001:db8::1 as 65512;
neighbor 2001:db8::2 as 65512;
}
I don't think this works for MAC, for IMET it works because that has a
custom PSMI BGP attribute which is set to encap0->router_addr). Setting
the next hop in this way will clear the mpls labelstack. So we'd end up
with:
fe:54:00:f0:11:02 vlan 100 mpls 0 unicast [evpn1 2026-02-19] * (80)
via 192.0.2.1 on vxlan0 mpls 0
and we'd lose the VNI.
groet,
Pim
[1] skipping the wait for tunnel_dev to become operational:
@@ -1059,11 +1070,37 @@ evpn_start(struct proto *P)
P->mpls_map->vrf_iface = P->vrf;
*/
+ /* If router address and VNI are fully configured, no need to wait for
+ * the tunnel device to come up (e.g., when VPP manages VXLAN tunnels).
+ * Schedule an immediate event to transition to PS_UP. */
+ struct evpn_encap *encap0 = evpn_get_encap(p);
+ if (!ipa_zero(encap0->router_addr) && (p->vni != U32_UNDEF))
+ {
+ event *e = ev_new_init(p->p.pool, evpn_no_iface_startup, p);
+ ev_schedule(e);
+ }
+
/* Wait for VXLAN interfaces to be up */
return PS_START;
}
+static void
+evpn_no_iface_startup(void *data)
+{
+ struct evpn_proto *p = data;
+
+ if (p->p.proto_state != PS_START)
+ return;
+
+ proto_notify_state(&p->p, PS_UP);
+
+ evpn_announce_imet(p, EVPN_ROOT_VLAN(p), 1);
+
+ WALK_LIST_(struct evpn_vlan, v, p->vlans)
+ evpn_announce_imet(p, v, 1);
+}
--
Pim van Pelt<[email protected]>
PBVP1-RIPEhttps://ipng.ch/
## Manual configuration for vpp1-0
eth table etab;
eth table etab100;
eth table etab200;
evpn table evpntab;
protocol static {
eth { table etab; };
route eth 00:00:01:00:00:01 vlan 100 prohibit;
route eth 00:00:02:00:00:01 vlan 200 prohibit;
}
protocol evpn {
debug all;
eth { table etab; };
evpn { import all; export all; };
rd 8298:100;
import target (rt, 8298, 10040);
export target (rt, 8298, 10040);
encapsulation vxlan {
tunnel device "vxlan0";
router address 2001:678:d78:200::;
};
vni 10040;
vid 100;
};
protocol evpn {
debug all;
eth { table etab; };
evpn { import all; export all; };
rd 8298:200;
import target (rt, 8298, 20040);
export target (rt, 8298, 20040);
encapsulation vxlan {
tunnel device "vxlan0";
router address 192.168.10.0;
};
vni 20040;
vid 200;
};
filter bgp_evpn_out {
# if (rt, 8298, 20040) ~ bgp_ext_community then { bgp_next_hop =
2001:678:d78:200::; }
accept;
}
template bgp T_BGP_EVPN {
evpn { import all; export filter bgp_evpn_out; };
local 2001:678:d78:200:: as 65512;
}
protocol bgp vpp0_1 from T_BGP_EVPN { neighbor 2001:678:d78:200::1 as 65512; }
protocol bgp vpp0_2 from T_BGP_EVPN { neighbor 2001:678:d78:200::2 as 65512; }
protocol bgp vpp0_3 from T_BGP_EVPN { neighbor 2001:678:d78:200::3 as 65512; }
protocol vppevpn bd100 {
debug all;
eth { table etab; import all; export all; };
vxlan ipv6 src 2001:678:d78:200::;
vxlan ipv4 src 192.168.10.0;
vxlan src port 4789;
vxlan dst port 4789;
bridge domain 100;
scan time 5;
vid 100;
vni 10040;
};
protocol vppevpn bd200 {
debug all;
eth { table etab; import all; export all; };
vxlan ipv6 src 2001:678:d78:200::;
vxlan ipv4 src 192.168.10.0;
vxlan src port 4789;
vxlan dst port 4789;
bridge domain 200;
bridge mac age 10;
scan time 5;
vid 200;
vni 20040;
};