i was bored at home a few weeks back, so i had a go at scratching
an itch i've had for a while now which was to write a quick and
dirty ethernet switch. the itch got worse recently when stsp@ asked
about some weird packet behaviour that may or may not have been
caused by bridge(4). trying to follow the code and how it interacts
with the stack was... challenging.

since then it has become less quick and dirty, and i've shined it enough
that i think it should be considered for the tree. however, it's not a
rewrite of bridge(4), there's some very significant semantic differences
that need to be explained.

the new driver is called veb(4), short for Virtual Ethernet Bridge. it
also contains a companion driver called vport(4), which i'll explain on
the way.

veb(4), like bridge(4), is a software implementation of an ethernet
switch. it is also represented as a virtual clonable interface that
you create at runtime, and then you add other ethernet interfaces
to as ports. these ethernet interfaces then act like ports on a
switch. packets received by the ethernet port interfaces are input to
the switch, which then decides which other Ethernet port interface
to send the packet out of based on the destination ethernet address.

the most fundamental difference between bridge(4) and veb(4) is that
veb(4) takes over ports completely and only uses them for l2. packets
coming into a veb member goes into the switching code, and then the
packet pops out another interface. that's it. this is different to
bridge(4), which kind of treats each member interface as two ports,
one which goes to the wire and one which goes to the network stack
using that port. this is where a lot of my confusion about the
bridge(4) comes from, both in terms of the code and when im trying
to actually use it. this difference is where most of the simplifcation
in veb comes from, and is fundamental to how it works.

because veb is only a layer 2 switch, by default does not interact
with the layer 3 kernel handling at all. this includes both the
ip/mpls stacks, and pf.  probably the biggest visible consequence of this
is if you add an interface that is currently how you're connected to the
host, veb(4) will basically take those packets away from the stack and
you'll be disconnected.

to have pf look at packets going in and out of interfaces on a
veb(4), you have to enable the link1 flag.

if you want the layer 3 stacks in the kernel to participate on a
veb(4), you have to explicitly create and add vport(4) interfaces.
vport(4) is special, and is handled specially by veb(4). one half
of the special handling is that veb(4) tries to disable l3 handling
on ports, but it doesnt do that on vport(4) ports. this allows you to
treat vport interfaces like a normal ethernet interface, but instead of
being plugged into a physical switch, the vport interface is plugged
into the virtual switch.

veb does not run pf on vport inteface because pf will be run as
packets enter and leave the network stack. the stack runs pf on
vport interfaces regardless of whether link1 is set on the veb
interface or not.

a weird consequence of this is that pf on vport interfaces runs in
the opposite direction of pf on other veb ports. packets going from
veb out to an interface have pf run with PF_OUT, but if the packet
is going from veb to a vport, it will be run with PF_IN by the
stack.

the reason that pf is disabled on normal port interfaces by default
is to minimise the complications in pf state tracking that happen
in this situation. a simple ruleset with pf enabled on normal ports
would have a state get created when it enters the member port. then
if the packet was destined for the vport, it would match the same
state in the same direction that was created on the normal port.

having vports as explicit interfaces you have to create and add to the
veb allows for a couple of interesting use cases. firstly, you can
use veb as a nexus between different rdomains by attaching multiple
vports in their own rdomains to the same veb. i think ive figured
all the dragons out in the code to support that. as always, care
must be taken with how and when pf gets run on those different
interfaces.

the second is that you can have veb implement as a "bump in the wire",
applying policy or monitoring to traffic going over the veb with
confidence that it won't leak into the stack of the local system
unless you explicitly configure it to do so.

another part of the itch this diff was tryign to scratch was factoring
out the bridge (not bridge(4)) code i have in bpe and nvgre. there's now
some common code in if_etherbridge that is used by bpe, nvgre, and
veb(4) that handles the actual leaning and port lookups used by all
those drivers.

i am also looking at using that same code for vxlan(4), but i'm holding
off on that because i'd probably want to rework that code to use udp
sockets at the same time.

some of the recent polishing has been to implemented the "protected"
pvlan, and filter rules features that bridge had.

lastly, there's some things bridge(4) does that veb(4) does not do. the
main things i can think of are the ipsec interception that bridge(4)
does, spanning tree support, the ethernet address table management (eg,
static entries or deleting specific entries), and some port flag handling.
apart from stp, none of it is particularly hard. it's just hard to get
motivated to do any more of this out of tree anymore.

oh, veb(4) should be a lot faster than bridge(4) too. and mpsafe. and
able to be run concurrently. hrvoje popovski has tested some versions of
these diffs and has the following numbers so far:

> 3550m4 - slower box
> forwarding - 560 Kpps
> bridge - 400 Kpps
> veb - 850 Kpps
> tpmr - 920 Kpps
> 
> r620 - faster box
> forwarding - 1 Mpps
> bridge - 680 Kpps
> veb - 1.5 Mpps
> tpmr - 1.75 Mpps

ignoring the performance differences between bridge(4) and veb(4),
i am interested in thoughts on the semantic differences between
them. if anyone wants some insight into why bridge(4) is the way
it is, you can read the Transparent Network Security Policy Enforcement
paper by angelos and jason.

ive been using this code at home for half a week now, and it's been very
boring, which is unlike my first attempts at using bridge(4) for
the same work. i am obviously biased though.

Index: conf/GENERIC
===================================================================
RCS file: /cvs/src/sys/conf/GENERIC,v
retrieving revision 1.273
diff -u -p -r1.273 GENERIC
--- conf/GENERIC        30 Sep 2020 14:51:17 -0000      1.273
+++ conf/GENERIC        10 Feb 2021 12:06:23 -0000
@@ -82,11 +82,13 @@ pseudo-device       msts    1       # MSTS line discipl
 pseudo-device  endrun  1       # EndRun line discipline
 pseudo-device  vnd     4       # vnode disk devices
 pseudo-device  ksyms   1       # kernel symbols device
+pseudo-device  kstat           # kernel statistics
 #pseudo-device dt              # Dynamic Tracer
 
 # clonable devices
 pseudo-device  bpfilter        # packet filter
 pseudo-device  bridge          # network bridging support
+pseudo-device  veb             # virtual Ethernet bridge
 pseudo-device  carp            # CARP protocol support
 pseudo-device  etherip         # EtherIP (RFC 3378)
 pseudo-device  gif             # IPv[46] over IPv[46] tunnel (RFC1933)
Index: conf/files
===================================================================
RCS file: /cvs/src/sys/conf/files,v
retrieving revision 1.693
diff -u -p -r1.693 files
--- conf/files  28 Jan 2021 14:53:20 -0000      1.693
+++ conf/files  10 Feb 2021 12:06:23 -0000
@@ -13,6 +13,7 @@ define        audio {}
 define scsi {}
 define atascsi {}
 define ifmedia
+define etherbridge
 define mii {[phy = -1]}
 define midibus {}
 define radiobus {}
@@ -555,11 +556,12 @@ pseudo-device bpfilter: ifnet
 pseudo-device enc: ifnet
 pseudo-device etherip: ifnet, ether, ifmedia
 pseudo-device bridge: ifnet, ether
+pseudo-device veb: ifnet, ether, etherbridge
 pseudo-device vlan: ifnet, ether
 pseudo-device carp: ifnet, ether
 pseudo-device sppp: ifnet
 pseudo-device gif: ifnet
-pseudo-device gre: ifnet
+pseudo-device gre: ifnet, ether, etherbridge
 pseudo-device crypto: ifnet
 pseudo-device trunk: ifnet, ether, ifmedia
 pseudo-device aggr: ifnet, ether, ifmedia
@@ -567,7 +569,7 @@ pseudo-device tpmr: ifnet, ether, ifmedi
 pseudo-device mpe: ifnet, mpls
 pseudo-device mpw: ifnet, mpls, ether
 pseudo-device mpip: ifnet, mpls
-pseudo-device bpe: ifnet, ether, ifmedia
+pseudo-device bpe: ifnet, ether, ifmedia, etherbridge
 pseudo-device vether: ifnet, ether
 pseudo-device pppx: ifnet
 pseudo-device vxlan: ifnet, ether, ifmedia
@@ -812,6 +814,8 @@ file net/if_tun.c                   tun                     
needs-count
 file net/if_bridge.c                   bridge                  needs-count
 file net/bridgectl.c                   bridge
 file net/bridgestp.c                   bridge
+file net/if_etherbridge.c              etherbridge
+file net/if_veb.c                      veb
 file net/if_vlan.c                     vlan                    needs-count
 file net/if_switch.c                   switch                  needs-count
 file net/switchctl.c                   switch
@@ -840,7 +844,7 @@ file net/if_wg.c                    wg
 file net/wg_noise.c                    wg
 file net/wg_cookie.c                   wg
 file net/bfd.c                         bfd
-file net/toeplitz.c                    stoeplitz               needs-flag
+file net/toeplitz.c                    stoeplitz | etherbridge needs-flag
 file net80211/ieee80211.c              wlan
 file net80211/ieee80211_amrr.c         wlan
 file net80211/ieee80211_crypto.c       wlan
Index: net/if_bpe.c
===================================================================
RCS file: /cvs/src/sys/net/if_bpe.c,v
retrieving revision 1.15
diff -u -p -r1.15 if_bpe.c
--- net/if_bpe.c        19 Jan 2021 07:30:19 -0000      1.15
+++ net/if_bpe.c        10 Feb 2021 12:06:23 -0000
@@ -27,6 +27,7 @@
 #include <sys/timeout.h>
 #include <sys/pool.h>
 #include <sys/tree.h>
+#include <sys/smr.h>
 
 #include <net/if.h>
 #include <net/if_var.h>
@@ -40,7 +41,7 @@
 
 /* for bridge stuff */
 #include <net/if_bridge.h>
-
+#include <net/if_etherbridge.h>
 
 #if NBPFILTER > 0
 #include <net/bpf.h>
@@ -74,42 +75,17 @@ static inline int bpe_cmp(const struct b
 RBT_PROTOTYPE(bpe_tree, bpe_key, k_entry, bpe_cmp);
 RBT_GENERATE(bpe_tree, bpe_key, k_entry, bpe_cmp);
 
-struct bpe_entry {
-       struct ether_addr       be_c_da; /* customer address - must be first */
-       struct ether_addr       be_b_da; /* bridge address */
-       unsigned int            be_type;
-#define BPE_ENTRY_DYNAMIC              0
-#define BPE_ENTRY_STATIC               1
-       struct refcnt           be_refs;
-       time_t                  be_age;
-
-       RBT_ENTRY(bpe_entry)    be_entry;
-};
-
-RBT_HEAD(bpe_map, bpe_entry);
-
-static inline int bpe_entry_cmp(const struct bpe_entry *,
-    const struct bpe_entry *);
-
-RBT_PROTOTYPE(bpe_map, bpe_entry, be_entry, bpe_entry_cmp);
-RBT_GENERATE(bpe_map, bpe_entry, be_entry, bpe_entry_cmp);
-
 struct bpe_softc {
        struct bpe_key          sc_key; /* must be first */
        struct arpcom           sc_ac;
        int                     sc_txhprio;
        int                     sc_rxhprio;
-       uint8_t                 sc_group[ETHER_ADDR_LEN];
+       struct ether_addr       sc_group;
 
        struct task             sc_ltask;
        struct task             sc_dtask;
 
-       struct bpe_map          sc_bridge_map;
-       struct rwlock           sc_bridge_lock;
-       unsigned int            sc_bridge_num;
-       unsigned int            sc_bridge_max;
-       int                     sc_bridge_tmo; /* seconds */
-       struct timeout          sc_bridge_age;
+       struct etherbridge      sc_eb;
 };
 
 void           bpeattach(int);
@@ -132,16 +108,26 @@ static void       bpe_link_hook(void *);
 static void    bpe_link_state(struct bpe_softc *, u_char, uint64_t);
 static void    bpe_detach_hook(void *);
 
-static void    bpe_input_map(struct bpe_softc *,
-                   const uint8_t *, const uint8_t *);
-static void    bpe_bridge_age(void *);
-
 static struct if_clone bpe_cloner =
     IF_CLONE_INITIALIZER("bpe", bpe_clone_create, bpe_clone_destroy);
 
+static int      bpe_eb_port_eq(void *, void *, void *);
+static void    *bpe_eb_port_take(void *, void *);
+static void     bpe_eb_port_rele(void *, void *);
+static size_t   bpe_eb_port_ifname(void *, char *, size_t, void *);
+static void     bpe_eb_port_sa(void *, struct sockaddr_storage *, void *);
+
+static const struct etherbridge_ops bpe_etherbridge_ops = {
+       bpe_eb_port_eq,
+       bpe_eb_port_take,
+       bpe_eb_port_rele,
+       bpe_eb_port_ifname,
+       bpe_eb_port_sa,
+};
+
 static struct bpe_tree bpe_interfaces = RBT_INITIALIZER();
 static struct rwlock bpe_lock = RWLOCK_INITIALIZER("bpeifs");
-static struct pool bpe_entry_pool;
+static struct pool bpe_endpoint_pool;
 
 void
 bpeattach(int count)
@@ -154,18 +140,27 @@ bpe_clone_create(struct if_clone *ifc, i
 {
        struct bpe_softc *sc;
        struct ifnet *ifp;
+       int error;
 
-       if (bpe_entry_pool.pr_size == 0) {
-               pool_init(&bpe_entry_pool, sizeof(struct bpe_entry), 0,
+       if (bpe_endpoint_pool.pr_size == 0) {
+               pool_init(&bpe_endpoint_pool, sizeof(struct ether_addr), 0,
                    IPL_NONE, 0, "bpepl", NULL);
        }
 
        sc = malloc(sizeof(*sc), M_DEVBUF, M_WAITOK|M_ZERO);
+
        ifp = &sc->sc_ac.ac_if;
 
        snprintf(ifp->if_xname, sizeof(ifp->if_xname), "%s%d",
            ifc->ifc_name, unit);
 
+       error = etherbridge_init(&sc->sc_eb, ifp->if_xname,
+           &bpe_etherbridge_ops, sc);
+       if (error == -1) {
+               free(sc, M_DEVBUF, sizeof(*sc));
+               return (error);
+       }
+
        sc->sc_key.k_if = 0;
        sc->sc_key.k_isid = 0;
        bpe_set_group(sc, 0);
@@ -176,13 +171,6 @@ bpe_clone_create(struct if_clone *ifc, i
        task_set(&sc->sc_ltask, bpe_link_hook, sc);
        task_set(&sc->sc_dtask, bpe_detach_hook, sc);
 
-       rw_init(&sc->sc_bridge_lock, "bpebr");
-       RBT_INIT(bpe_map, &sc->sc_bridge_map);
-       sc->sc_bridge_num = 0;
-       sc->sc_bridge_max = 100; /* XXX */
-       sc->sc_bridge_tmo = 240;
-       timeout_set_proc(&sc->sc_bridge_age, bpe_bridge_age, sc);
-
        ifp->if_softc = sc;
        ifp->if_hardmtu = ETHER_MAX_HARDMTU_LEN;
        ifp->if_ioctl = bpe_ioctl;
@@ -211,25 +199,9 @@ bpe_clone_destroy(struct ifnet *ifp)
        ether_ifdetach(ifp);
        if_detach(ifp);
 
-       free(sc, M_DEVBUF, sizeof(*sc));
-
-       return (0);
-}
-
-static inline int
-bpe_entry_valid(struct bpe_softc *sc, const struct bpe_entry *be)
-{
-       time_t diff;
-
-       if (be == NULL)
-               return (0);
-
-       if (be->be_type == BPE_ENTRY_STATIC)
-               return (1);
+       etherbridge_destroy(&sc->sc_eb);
 
-       diff = getuptime() - be->be_age;
-       if (diff < sc->sc_bridge_tmo)
-               return (1);
+       free(sc, M_DEVBUF, sizeof(*sc));
 
        return (0);
 }
@@ -287,23 +259,21 @@ bpe_start(struct ifnet *ifp)
                beh = mtod(m, struct ether_header *);
 
                if (ETHER_IS_BROADCAST(ceh->ether_dhost)) {
-                       memcpy(beh->ether_dhost, sc->sc_group,
+                       memcpy(beh->ether_dhost, &sc->sc_group,
                            sizeof(beh->ether_dhost));
                } else {
-                       struct bpe_entry *be;
+                       struct ether_addr *endpoint;
 
-                       rw_enter_read(&sc->sc_bridge_lock);
-                       be = RBT_FIND(bpe_map, &sc->sc_bridge_map,
-                           (struct bpe_entry *)ceh->ether_dhost);
-                       if (bpe_entry_valid(sc, be)) {
-                               memcpy(beh->ether_dhost, &be->be_b_da,
-                                   sizeof(beh->ether_dhost));
-                       } else {
+                       smr_read_enter();
+                       endpoint = etherbridge_resolve(&sc->sc_eb,
+                           (struct ether_addr *)ceh->ether_dhost);
+                       if (endpoint == NULL) {
                                /* "flood" to unknown hosts */
-                               memcpy(beh->ether_dhost, sc->sc_group,
-                                   sizeof(beh->ether_dhost));
+                               endpoint = &sc->sc_group;
                        }
-                       rw_exit_read(&sc->sc_bridge_lock);
+                       memcpy(beh->ether_dhost, endpoint,
+                           sizeof(beh->ether_dhost));
+                       smr_read_leave();
                }
 
                memcpy(beh->ether_shost, ((struct arpcom *)ifp0)->ac_enaddr,
@@ -326,121 +296,6 @@ done:
        if_put(ifp0);
 }
 
-static void
-bpe_bridge_age(void *arg)
-{
-       struct bpe_softc *sc = arg;
-       struct bpe_entry *be, *nbe;
-       time_t diff;
-
-       timeout_add_sec(&sc->sc_bridge_age, BPE_BRIDGE_AGE_TMO);
-
-       rw_enter_write(&sc->sc_bridge_lock);
-       RBT_FOREACH_SAFE(be, bpe_map, &sc->sc_bridge_map, nbe) {
-               if (be->be_type != BPE_ENTRY_DYNAMIC)
-                       continue;
-
-               diff = getuptime() - be->be_age;
-               if (diff < sc->sc_bridge_tmo)
-                       continue;
-
-               sc->sc_bridge_num--;
-               RBT_REMOVE(bpe_map, &sc->sc_bridge_map, be);
-               if (refcnt_rele(&be->be_refs))
-                       pool_put(&bpe_entry_pool, be);
-       }
-       rw_exit_write(&sc->sc_bridge_lock);
-}
-
-static int
-bpe_rtfind(struct bpe_softc *sc, struct ifbaconf *baconf)
-{
-       struct ifnet *ifp = &sc->sc_ac.ac_if;
-       struct bpe_entry *be;
-       struct ifbareq bareq;
-       caddr_t uaddr, end;
-       int error;
-       time_t age;
-       struct sockaddr_dl *sdl;
-
-       if (baconf->ifbac_len == 0) {
-               /* single read is atomic */
-               baconf->ifbac_len = sc->sc_bridge_num * sizeof(bareq);
-               return (0);
-       }
-
-       uaddr = baconf->ifbac_buf;
-       end = uaddr + baconf->ifbac_len;
-
-       rw_enter_read(&sc->sc_bridge_lock);
-       RBT_FOREACH(be, bpe_map, &sc->sc_bridge_map) {
-               if (uaddr >= end)
-                       break;
-
-               memcpy(bareq.ifba_name, ifp->if_xname,
-                   sizeof(bareq.ifba_name));
-               memcpy(bareq.ifba_ifsname, ifp->if_xname,
-                   sizeof(bareq.ifba_ifsname));
-               memcpy(&bareq.ifba_dst, &be->be_c_da,
-                   sizeof(bareq.ifba_dst));
-
-               memset(&bareq.ifba_dstsa, 0, sizeof(bareq.ifba_dstsa));
-
-               bzero(&bareq.ifba_dstsa, sizeof(bareq.ifba_dstsa));
-               sdl = (struct sockaddr_dl *)&bareq.ifba_dstsa;
-               sdl->sdl_len = sizeof(sdl);
-               sdl->sdl_family = AF_LINK;
-               sdl->sdl_index = 0;
-               sdl->sdl_type = IFT_ETHER;
-               sdl->sdl_nlen = 0;
-               sdl->sdl_alen = sizeof(be->be_b_da);
-               CTASSERT(sizeof(sdl->sdl_data) >= sizeof(be->be_b_da));
-               memcpy(sdl->sdl_data, &be->be_b_da, sizeof(be->be_b_da));
-
-               switch (be->be_type) {
-               case BPE_ENTRY_DYNAMIC:
-                       age = getuptime() - be->be_age;
-                       bareq.ifba_age = MIN(age, 0xff);
-                       bareq.ifba_flags = IFBAF_DYNAMIC;
-                       break;
-               case BPE_ENTRY_STATIC:
-                       bareq.ifba_age = 0;
-                       bareq.ifba_flags = IFBAF_STATIC;
-                       break;
-               }
-
-               error = copyout(&bareq, uaddr, sizeof(bareq));
-               if (error != 0) {
-                       rw_exit_read(&sc->sc_bridge_lock);
-                       return (error);
-               }
-
-               uaddr += sizeof(bareq);
-       }
-       baconf->ifbac_len = sc->sc_bridge_num * sizeof(bareq);
-       rw_exit_read(&sc->sc_bridge_lock);
-
-       return (0);
-}
-
-static void
-bpe_flush_map(struct bpe_softc *sc, uint32_t flags)
-{
-       struct bpe_entry *be, *nbe;
-
-       rw_enter_write(&sc->sc_bridge_lock);
-       RBT_FOREACH_SAFE(be, bpe_map, &sc->sc_bridge_map, nbe) {
-               if (flags == IFBF_FLUSHDYN &&
-                   be->be_type != BPE_ENTRY_DYNAMIC)
-                       continue;
-
-               RBT_REMOVE(bpe_map, &sc->sc_bridge_map, be);
-               if (refcnt_rele(&be->be_refs))
-                       pool_put(&bpe_entry_pool, be);
-       }
-       rw_exit_write(&sc->sc_bridge_lock);
-}
-
 static int
 bpe_ioctl(struct ifnet *ifp, u_long cmd, caddr_t data)
 {
@@ -510,16 +365,10 @@ bpe_ioctl(struct ifnet *ifp, u_long cmd,
                if (error != 0)
                        break;
 
-               if (bparam->ifbrp_csize < 1) {
-                       error = EINVAL;
-                       break;
-               }
-
-               /* commit */
-               sc->sc_bridge_max = bparam->ifbrp_csize;
+               error = etherbridge_set_max(&sc->sc_eb, bparam);
                break;
        case SIOCBRDGGCACHE:
-               bparam->ifbrp_csize = sc->sc_bridge_max;
+               error = etherbridge_get_max(&sc->sc_eb, bparam);
                break;
 
        case SIOCBRDGSTO:
@@ -527,26 +376,22 @@ bpe_ioctl(struct ifnet *ifp, u_long cmd,
                if (error != 0)
                        break;
 
-               if (bparam->ifbrp_ctime < 8 ||
-                   bparam->ifbrp_ctime > 3600) {
-                       error = EINVAL;
-                       break;
-               }
-               sc->sc_bridge_tmo = bparam->ifbrp_ctime;
+               error = etherbridge_set_tmo(&sc->sc_eb, bparam);
                break;
        case SIOCBRDGGTO:
-               bparam->ifbrp_ctime = sc->sc_bridge_tmo;
+               error = etherbridge_get_tmo(&sc->sc_eb, bparam);
                break;
 
        case SIOCBRDGRTS:
-               error = bpe_rtfind(sc, (struct ifbaconf *)data);
+               error = etherbridge_rtfind(&sc->sc_eb,
+                   (struct ifbaconf *)data);
                break;
        case SIOCBRDGFLUSH:
                error = suser(curproc);
                if (error != 0)
                        break;
 
-               bpe_flush_map(sc,
+               etherbridge_flush(&sc->sc_eb,
                    ((struct ifbreq *)data)->ifbr_ifsflags);
                break;
 
@@ -580,16 +425,22 @@ bpe_up(struct bpe_softc *sc)
        struct ifnet *ifp = &sc->sc_ac.ac_if;
        struct ifnet *ifp0;
        struct bpe_softc *osc;
-       int error = 0;
+       int error;
        u_int hardmtu;
        u_int hlen = sizeof(struct ether_header) + sizeof(uint32_t);
 
        KASSERT(!ISSET(ifp->if_flags, IFF_RUNNING));
        NET_ASSERT_LOCKED();
 
+       error = etherbridge_up(&sc->sc_eb);
+       if (error != 0)
+               return (error);
+
        ifp0 = if_get(sc->sc_key.k_if);
-       if (ifp0 == NULL)
-               return (ENXIO);
+       if (ifp0 == NULL) {
+               error = ENXIO;
+               goto down;
+       }
 
        /* check again if bpe will work on top of the parent */
        if (ifp0->if_type != IFT_ETHER) {
@@ -643,8 +494,6 @@ bpe_up(struct bpe_softc *sc)
 
        if_put(ifp0);
 
-       timeout_add_sec(&sc->sc_bridge_age, BPE_BRIDGE_AGE_TMO);
-
        return (0);
 
 remove:
@@ -656,6 +505,8 @@ scrub:
        ifp->if_hardmtu = 0xffff;
 put:
        if_put(ifp0);
+down:
+       etherbridge_down(&sc->sc_eb);
 
        return (error);
 }
@@ -685,6 +536,8 @@ bpe_down(struct bpe_softc *sc)
        CLR(ifp->if_flags, IFF_SIMPLEX);
        ifp->if_hardmtu = 0xffff;
 
+       etherbridge_down(&sc->sc_eb);
+
        return (0);
 }
 
@@ -702,7 +555,7 @@ bpe_multi(struct bpe_softc *sc, struct i
        CTASSERT(sizeof(sa->sa_data) >= sizeof(sc->sc_group));
 
        sa->sa_family = AF_UNSPEC;
-       memcpy(sa->sa_data, sc->sc_group, sizeof(sc->sc_group));
+       memcpy(sa->sa_data, &sc->sc_group, sizeof(sc->sc_group));
 
        return ((*ifp0->if_ioctl)(ifp0, cmd, (caddr_t)&ifr));
 }
@@ -710,7 +563,7 @@ bpe_multi(struct bpe_softc *sc, struct i
 static void
 bpe_set_group(struct bpe_softc *sc, uint32_t isid)
 {
-       uint8_t *group = sc->sc_group;
+       uint8_t *group = sc->sc_group.ether_addr_octet;
 
        group[0] = 0x01;
        group[1] = 0x1e;
@@ -740,7 +593,7 @@ bpe_set_vnetid(struct bpe_softc *sc, con
        /* commit */
        sc->sc_key.k_isid = isid;
        bpe_set_group(sc, isid);
-       bpe_flush_map(sc, IFBF_FLUSHALL);
+       etherbridge_flush(&sc->sc_eb, IFBF_FLUSHALL);
 
        return (0);
 }
@@ -771,7 +624,7 @@ bpe_set_parent(struct bpe_softc *sc, con
 
        /* commit */
        sc->sc_key.k_if = ifp0->if_index;
-       bpe_flush_map(sc, IFBF_FLUSHALL);
+       etherbridge_flush(&sc->sc_eb, IFBF_FLUSHALL);
 
 put:
        if_put(ifp0);
@@ -804,7 +657,7 @@ bpe_del_parent(struct bpe_softc *sc)
 
        /* commit */
        sc->sc_key.k_if = 0;
-       bpe_flush_map(sc, IFBF_FLUSHALL);
+       etherbridge_flush(&sc->sc_eb, IFBF_FLUSHALL);
 
        return (0);
 }
@@ -822,75 +675,6 @@ bpe_find(struct ifnet *ifp0, uint32_t is
        return (sc);
 }
 
-static void
-bpe_input_map(struct bpe_softc *sc, const uint8_t *ba, const uint8_t *ca)
-{
-       struct bpe_entry *be;
-       int new = 0;
-
-       if (ETHER_IS_MULTICAST(ca))
-               return;
-
-       /* remember where it came from */
-       rw_enter_read(&sc->sc_bridge_lock);
-       be = RBT_FIND(bpe_map, &sc->sc_bridge_map, (struct bpe_entry *)ca);
-       if (be == NULL)
-               new = 1;
-       else {
-               be->be_age = getuptime(); /* only a little bit racy */
-
-               if (be->be_type != BPE_ENTRY_DYNAMIC ||
-                   ETHER_IS_EQ(ba, &be->be_b_da))
-                       be = NULL;
-               else
-                       refcnt_take(&be->be_refs);
-       }
-       rw_exit_read(&sc->sc_bridge_lock);
-
-       if (new) {
-               struct bpe_entry *obe;
-               unsigned int num;
-
-               be = pool_get(&bpe_entry_pool, PR_NOWAIT);
-               if (be == NULL) {
-                       /* oh well */
-                       return;
-               }
-
-               memcpy(&be->be_c_da, ca, sizeof(be->be_c_da));
-               memcpy(&be->be_b_da, ba, sizeof(be->be_b_da));
-               be->be_type = BPE_ENTRY_DYNAMIC;
-               refcnt_init(&be->be_refs);
-               be->be_age = getuptime();
-
-               rw_enter_write(&sc->sc_bridge_lock);
-               num = sc->sc_bridge_num;
-               if (++num > sc->sc_bridge_max)
-                       obe = be;
-               else {
-                       /* try and give the ref to the map */
-                       obe = RBT_INSERT(bpe_map, &sc->sc_bridge_map, be);
-                       if (obe == NULL) {
-                               /* count the insert */
-                               sc->sc_bridge_num = num;
-                       }
-               }
-               rw_exit_write(&sc->sc_bridge_lock);
-
-               if (obe != NULL)
-                       pool_put(&bpe_entry_pool, obe);
-       } else if (be != NULL) {
-               rw_enter_write(&sc->sc_bridge_lock);
-               memcpy(&be->be_b_da, ba, sizeof(be->be_b_da));
-               rw_exit_write(&sc->sc_bridge_lock);
-
-               if (refcnt_rele(&be->be_refs)) {
-                       /* ioctl may have deleted the entry */
-                       pool_put(&bpe_entry_pool, be);
-               }
-       }
-}
-
 void
 bpe_input(struct ifnet *ifp0, struct mbuf *m)
 {
@@ -928,7 +712,8 @@ bpe_input(struct ifnet *ifp0, struct mbu
 
        ceh = (struct ether_header *)(itagp + 1);
 
-       bpe_input_map(sc, beh->ether_shost, ceh->ether_shost);
+       etherbridge_map(&sc->sc_eb, ceh->ether_shost,
+           (struct ether_addr *)beh->ether_shost);
 
        m_adj(m, sizeof(*beh) + sizeof(*itagp));
 
@@ -1035,12 +820,62 @@ bpe_cmp(const struct bpe_key *a, const s
                return (1);
        if (a->k_isid < b->k_isid)
                return (-1);
-
+ 
        return (0);
 }
 
-static inline int
-bpe_entry_cmp(const struct bpe_entry *a, const struct bpe_entry *b)
+static int
+bpe_eb_port_eq(void *arg, void *a, void *b)
+{
+       struct ether_addr *ea = a, *eb = b;
+
+       return (memcmp(ea, eb, sizeof(*ea)) == 0);
+}
+
+static void *
+bpe_eb_port_take(void *arg, void *port)
+{
+       struct ether_addr *ea = port;
+       struct ether_addr *endpoint;
+
+       endpoint = pool_get(&bpe_endpoint_pool, PR_NOWAIT);
+       if (endpoint == NULL)
+               return (NULL);
+
+       memcpy(endpoint, ea, sizeof(*endpoint));
+
+       return (endpoint);
+}
+
+static void
+bpe_eb_port_rele(void *arg, void *port)
+{
+       struct ether_addr *endpoint = port;
+
+       pool_put(&bpe_endpoint_pool, endpoint);
+}
+
+static size_t
+bpe_eb_port_ifname(void *arg, char *dst, size_t len, void *port)
 {
-       return memcmp(&a->be_c_da, &b->be_c_da, sizeof(a->be_c_da));
+       struct bpe_softc *sc = arg;
+
+       return (strlcpy(dst, sc->sc_ac.ac_if.if_xname, len));
+}
+
+static void
+bpe_eb_port_sa(void *arg, struct sockaddr_storage *ss, void *port)
+{
+       struct ether_addr *endpoint = port;
+       struct sockaddr_dl *sdl;
+
+       sdl = (struct sockaddr_dl *)ss;
+       sdl->sdl_len = sizeof(sdl);
+       sdl->sdl_family = AF_LINK;
+       sdl->sdl_index = 0;
+       sdl->sdl_type = IFT_ETHER;
+       sdl->sdl_nlen = 0;
+       sdl->sdl_alen = sizeof(*endpoint);
+       CTASSERT(sizeof(sdl->sdl_data) >= sizeof(*endpoint));
+       memcpy(sdl->sdl_data, endpoint, sizeof(*endpoint));
 }
Index: net/if_etherbridge.c
===================================================================
RCS file: net/if_etherbridge.c
diff -N net/if_etherbridge.c
--- /dev/null   1 Jan 1970 00:00:00 -0000
+++ net/if_etherbridge.c        10 Feb 2021 12:06:23 -0000
@@ -0,0 +1,584 @@
+/*     $OpenBSD$ */
+
+/*
+ * Copyright (c) 2018, 2021 David Gwynne <d...@openbsd.org>
+ *
+ * Permission to use, copy, modify, and distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
+
+#include "bpfilter.h"
+
+#include <sys/param.h>
+#include <sys/systm.h>
+#include <sys/kernel.h>
+#include <sys/mbuf.h>
+#include <sys/socket.h>
+#include <sys/ioctl.h>
+#include <sys/timeout.h>
+#include <sys/pool.h>
+#include <sys/tree.h>
+
+#include <net/if.h>
+#include <net/if_var.h>
+#include <net/if_dl.h>
+#include <net/if_media.h>
+#include <net/if_types.h>
+#include <net/rtable.h>
+#include <net/toeplitz.h>
+
+#include <netinet/in.h>
+#include <netinet/if_ether.h>
+
+/* for bridge stuff */
+#include <net/if_bridge.h>
+
+#include <net/if_etherbridge.h>
+
+static inline void     ebe_take(struct eb_entry *);
+static inline void     ebe_rele(struct eb_entry *);
+static void            ebe_free(void *);
+
+static void            etherbridge_age(void *);
+
+RBT_PROTOTYPE(eb_tree, eb_entry, ebe_tentry, ebt_cmp);
+
+static struct pool     eb_entry_pool;
+
+static inline int
+eb_port_eq(struct etherbridge *eb, void *a, void *b)
+{
+       return ((*eb->eb_ops->eb_op_port_eq)(eb->eb_cookie, a, b));
+}
+
+static inline void *
+eb_port_take(struct etherbridge *eb, void *port)
+{
+       return ((*eb->eb_ops->eb_op_port_take)(eb->eb_cookie, port));
+}
+
+static inline void
+eb_port_rele(struct etherbridge *eb, void *port)
+{
+       return ((*eb->eb_ops->eb_op_port_rele)(eb->eb_cookie, port));
+}
+
+static inline size_t
+eb_port_ifname(struct etherbridge *eb, char *dst, size_t len, void *port)
+{
+       return ((*eb->eb_ops->eb_op_port_ifname)(eb->eb_cookie, dst, len,
+           port));
+}
+
+static inline void
+eb_port_sa(struct etherbridge *eb, struct sockaddr_storage *ss, void *port)
+{
+       (*eb->eb_ops->eb_op_port_sa)(eb->eb_cookie, ss, port);
+}
+
+int
+etherbridge_init(struct etherbridge *eb, const char *name,
+    const struct etherbridge_ops *ops, void *cookie)
+{
+       size_t i;
+
+       if (eb_entry_pool.pr_size == 0) {
+               pool_init(&eb_entry_pool, sizeof(struct eb_entry),
+                   0, IPL_SOFTNET, 0, "ebepl", NULL);
+       }
+
+       eb->eb_table = mallocarray(ETHERBRIDGE_TABLE_SIZE,
+           sizeof(*eb->eb_table), M_DEVBUF, M_WAITOK|M_CANFAIL);
+       if (eb->eb_table == NULL)
+               return (ENOMEM);
+
+       eb->eb_name = name;
+       eb->eb_ops = ops;
+       eb->eb_cookie = cookie;
+
+       mtx_init(&eb->eb_lock, IPL_SOFTNET);
+       RBT_INIT(eb_tree, &eb->eb_tree);
+
+       eb->eb_num = 0;
+       eb->eb_max = 100; /* XXX */
+       eb->eb_max_age = 8;
+       timeout_set(&eb->eb_tmo_age, etherbridge_age, eb);
+
+       for (i = 0; i < ETHERBRIDGE_TABLE_SIZE; i++) {
+               struct eb_list *ebl = &eb->eb_table[i];
+               SMR_TAILQ_INIT(ebl);
+       }
+
+       return (0);
+}
+
+int
+etherbridge_up(struct etherbridge *eb)
+{
+       etherbridge_age(eb);
+       return (0);
+}
+
+int
+etherbridge_down(struct etherbridge *eb)
+{
+       smr_barrier();
+
+       return (0);
+}
+
+void
+etherbridge_destroy(struct etherbridge *eb)
+{
+       struct eb_entry *ebe, *nebe;
+
+       /* XXX assume that nothing will calling etherbridge_map now */
+
+       timeout_del_barrier(&eb->eb_tmo_age);
+
+       free(eb->eb_table, M_DEVBUF,
+           ETHERBRIDGE_TABLE_SIZE * sizeof(*eb->eb_table));
+
+       RBT_FOREACH_SAFE(ebe, eb_tree, &eb->eb_tree, nebe) {
+               RBT_REMOVE(eb_tree, &eb->eb_tree, ebe);
+               ebe_free(ebe);
+       }
+}
+
+static struct eb_list *
+etherbridge_list(struct etherbridge *eb, const struct ether_addr *ea)
+{
+       uint16_t hash = stoeplitz_eaddr(ea->ether_addr_octet);
+       hash &= ETHERBRIDGE_TABLE_MASK;
+       return (&eb->eb_table[hash]);
+}
+
+static struct eb_entry *
+ebl_find(struct eb_list *ebl, const struct ether_addr *ea)
+{
+       struct eb_entry *ebe;
+
+       SMR_TAILQ_FOREACH(ebe, ebl, ebe_lentry) {
+               if (ETHER_IS_EQ(ea, &ebe->ebe_addr))
+                       return (ebe);
+       }
+
+       return (NULL);
+}
+
+static inline void
+ebl_insert(struct eb_list *ebl, struct eb_entry *ebe)
+{
+       SMR_TAILQ_INSERT_TAIL_LOCKED(ebl, ebe, ebe_lentry);
+}
+
+static inline void
+ebl_remove(struct eb_list *ebl, struct eb_entry *ebe)
+{
+       SMR_TAILQ_REMOVE_LOCKED(ebl, ebe, ebe_lentry);
+}
+
+static inline int
+ebt_cmp(const struct eb_entry *aebe, const struct eb_entry *bebe)
+{
+       return (memcmp(&aebe->ebe_addr, &bebe->ebe_addr,
+           sizeof(aebe->ebe_addr)));
+}
+
+RBT_GENERATE(eb_tree, eb_entry, ebe_tentry, ebt_cmp);
+
+static inline struct eb_entry *
+ebt_insert(struct etherbridge *eb, struct eb_entry *ebe)
+{
+       return (RBT_INSERT(eb_tree, &eb->eb_tree, ebe));
+}
+
+static inline void
+ebt_replace(struct etherbridge *eb, struct eb_entry *oebe,
+    struct eb_entry *nebe)
+{
+       struct eb_entry *rvebe;
+
+       RBT_REMOVE(eb_tree, &eb->eb_tree, oebe);
+       rvebe = RBT_INSERT(eb_tree, &eb->eb_tree, nebe);
+       KASSERTMSG(rvebe == NULL, "ebt_replace eb %p nebe %p rvebe %p",
+           eb, nebe, rvebe);
+}
+
+static inline void
+ebt_remove(struct etherbridge *eb, struct eb_entry *ebe)
+{
+       RBT_REMOVE(eb_tree, &eb->eb_tree, ebe);
+}
+
+static inline void
+ebe_take(struct eb_entry *ebe)
+{
+       refcnt_take(&ebe->ebe_refs);
+}
+
+static void
+ebe_rele(struct eb_entry *ebe)
+{
+       if (refcnt_rele(&ebe->ebe_refs))
+               smr_call(&ebe->ebe_smr_entry, ebe_free, ebe);
+}
+
+static void
+ebe_free(void *arg)
+{
+       struct eb_entry *ebe = arg;
+       struct etherbridge *eb = ebe->ebe_etherbridge;
+
+       eb_port_rele(eb, ebe->ebe_port);
+       pool_put(&eb_entry_pool, ebe);
+}
+
+void *
+etherbridge_resolve(struct etherbridge *eb, const struct ether_addr *ea)
+{
+       struct eb_list *ebl = etherbridge_list(eb, ea);
+       struct eb_entry *ebe;
+
+       SMR_ASSERT_CRITICAL();
+
+       ebe = ebl_find(ebl, ea);
+       if (ebe != NULL) {
+               if (ebe->ebe_type == EBE_DYNAMIC) {
+                       int diff = getuptime() - ebe->ebe_age;
+                       if (diff > eb->eb_max_age)
+                               return (NULL);
+               }
+
+               return (ebe->ebe_port);
+       }
+
+       return (NULL);
+}
+
+void
+etherbridge_map(struct etherbridge *eb, void *port,
+    const struct ether_addr *ea)
+{
+       struct eb_list *ebl;
+       struct eb_entry *oebe, *nebe;
+       unsigned int num;
+       void *nport;
+       int new = 0;
+
+       if (ETHER_IS_MULTICAST(ea->ether_addr_octet) ||
+           ETHER_IS_EQ(ea->ether_addr_octet, etheranyaddr))
+               return;
+
+       ebl = etherbridge_list(eb, ea);
+
+       smr_read_enter();
+       oebe = ebl_find(ebl, ea);
+       if (oebe == NULL)
+               new = 1;
+       else {
+               oebe->ebe_age = getuptime();
+
+               /* does this entry need to be replaced? */
+               if (oebe->ebe_type == EBE_DYNAMIC &&
+                   !eb_port_eq(eb, oebe->ebe_port, port)) {
+                       new = 1;
+                       ebe_take(oebe);
+               } else
+                       oebe = NULL;
+       }
+       smr_read_leave();
+
+       if (!new)
+               return;
+
+       nport = eb_port_take(eb, port);
+       if (nport == NULL) {
+               /* XXX should we remove the old one and flood? */
+               return;
+       }
+
+       nebe = pool_get(&eb_entry_pool, PR_NOWAIT);
+       if (nebe == NULL) {
+               /* XXX should we remove the old one and flood? */
+               eb_port_rele(eb, nport);
+               return;
+       }
+
+       smr_init(&nebe->ebe_smr_entry);
+       refcnt_init(&nebe->ebe_refs);
+       nebe->ebe_etherbridge = eb;
+
+       nebe->ebe_addr = *ea;
+       nebe->ebe_port = nport;
+       nebe->ebe_type = EBE_DYNAMIC;
+       nebe->ebe_age = getuptime();
+
+       mtx_enter(&eb->eb_lock);
+       num = eb->eb_num + (oebe == NULL);
+       if (num <= eb->eb_max && ebt_insert(eb, nebe) == oebe) {
+               /* we won, do the update */
+               ebl_insert(ebl, nebe);
+
+               if (oebe != NULL) {
+                       ebl_remove(ebl, oebe);
+                       ebt_replace(eb, oebe, nebe);
+
+                       /* take the table reference away */
+                       if (refcnt_rele(&oebe->ebe_refs)) {
+                               panic("%s: eb %p oebe %p refcnt",
+                                   __func__, eb, oebe);
+                       }
+               }
+
+               nebe = NULL;
+               eb->eb_num = num;
+       }
+       mtx_leave(&eb->eb_lock);
+
+       if (nebe != NULL) {
+               /*
+                * the new entry didnt make it into the
+                * table, so it can be freed directly.
+                */
+               ebe_free(nebe);
+       }
+
+       if (oebe != NULL) {
+               /*
+                * the old entry could be referenced in
+                * multiple places, including an smr read
+                * section, so release it properly.
+                */
+               ebe_rele(oebe);
+       }
+}
+
+static void
+etherbridge_age(void *arg)
+{
+       struct etherbridge *eb = arg;
+       struct eb_entry *ebe, *nebe;
+       struct eb_queue ebq = TAILQ_HEAD_INITIALIZER(ebq);
+       int diff;
+       unsigned int now = getuptime();
+       size_t i;
+
+       timeout_add_sec(&eb->eb_tmo_age, 100);
+
+       for (i = 0; i < ETHERBRIDGE_TABLE_SIZE; i++) {
+               struct eb_list *ebl = &eb->eb_table[i];
+#if 0
+               if (SMR_TAILQ_EMPTY(ebl));
+                       continue;
+#endif
+
+               mtx_enter(&eb->eb_lock); /* don't block map too much */
+               SMR_TAILQ_FOREACH_SAFE_LOCKED(ebe, ebl, ebe_lentry, nebe) {
+                       if (ebe->ebe_type != EBE_DYNAMIC)
+                               continue;
+
+                       diff = now - ebe->ebe_age;
+                       if (diff < eb->eb_max_age)
+                               continue;
+
+                       ebl_remove(ebl, ebe);
+                       ebt_remove(eb, ebe);
+                       eb->eb_num--;
+
+                       /* we own the tables ref now */
+
+                       TAILQ_INSERT_TAIL(&ebq, ebe, ebe_qentry);
+               }
+               mtx_leave(&eb->eb_lock);
+       }
+
+       TAILQ_FOREACH_SAFE(ebe, &ebq, ebe_qentry, nebe) {
+               TAILQ_REMOVE(&ebq, ebe, ebe_qentry);
+               ebe_rele(ebe);
+       }
+}
+
+void
+etherbridge_detach_port(struct etherbridge *eb, void *port)
+{
+       struct eb_entry *ebe, *nebe;
+       struct eb_queue ebq = TAILQ_HEAD_INITIALIZER(ebq);
+       size_t i;
+
+       for (i = 0; i < ETHERBRIDGE_TABLE_SIZE; i++) {
+               struct eb_list *ebl = &eb->eb_table[i];
+
+               mtx_enter(&eb->eb_lock); /* don't block map too much */
+               SMR_TAILQ_FOREACH_SAFE_LOCKED(ebe, ebl, ebe_lentry, nebe) {
+                       if (!eb_port_eq(eb, ebe->ebe_port, port))
+                               continue;
+
+                       ebl_remove(ebl, ebe);
+                       ebt_remove(eb, ebe);
+                       eb->eb_num--;
+
+                       /* we own the tables ref now */
+
+                       TAILQ_INSERT_TAIL(&ebq, ebe, ebe_qentry);
+               }
+               mtx_leave(&eb->eb_lock);
+       }
+
+       smr_barrier(); /* try and do it once for all the entries */
+
+       TAILQ_FOREACH_SAFE(ebe, &ebq, ebe_qentry, nebe) {
+               TAILQ_REMOVE(&ebq, ebe, ebe_qentry);
+               if (refcnt_rele(&ebe->ebe_refs))
+                       ebe_free(ebe);
+       }
+}
+
+void
+etherbridge_flush(struct etherbridge *eb, uint32_t flags)
+{
+       struct eb_entry *ebe, *nebe;
+       struct eb_queue ebq = TAILQ_HEAD_INITIALIZER(ebq);
+       size_t i;
+
+       for (i = 0; i < ETHERBRIDGE_TABLE_SIZE; i++) {
+               struct eb_list *ebl = &eb->eb_table[i];
+
+               mtx_enter(&eb->eb_lock); /* don't block map too much */
+               SMR_TAILQ_FOREACH_SAFE_LOCKED(ebe, ebl, ebe_lentry, nebe) {
+                       if (flags == IFBF_FLUSHDYN &&
+                           ebe->ebe_type != EBE_DYNAMIC)
+                               continue;
+
+                       ebl_remove(ebl, ebe);
+                       ebt_remove(eb, ebe);
+                       eb->eb_num--;
+
+                       /* we own the tables ref now */
+
+                       TAILQ_INSERT_TAIL(&ebq, ebe, ebe_qentry);
+               }
+               mtx_leave(&eb->eb_lock);
+       }
+
+       smr_barrier(); /* try and do it once for all the entries */
+
+       TAILQ_FOREACH_SAFE(ebe, &ebq, ebe_qentry, nebe) {
+               TAILQ_REMOVE(&ebq, ebe, ebe_qentry);
+               if (refcnt_rele(&ebe->ebe_refs))
+                       ebe_free(ebe);
+       }
+}
+
+int
+etherbridge_rtfind(struct etherbridge *eb, struct ifbaconf *baconf)
+{
+       struct eb_entry *ebe;
+       struct ifbareq bareq;
+       caddr_t buf;
+       size_t len, nlen;
+       time_t age, now = getuptime();
+       int error;
+
+       if (baconf->ifbac_len == 0) {
+               /* single read is atomic */
+               baconf->ifbac_len = eb->eb_num * sizeof(bareq);
+               return (0);
+       }
+
+       buf = malloc(baconf->ifbac_len, M_TEMP, M_WAITOK|M_CANFAIL);
+       if (buf == NULL)
+               return (ENOMEM);
+       len = 0;
+
+       mtx_enter(&eb->eb_lock);
+       RBT_FOREACH(ebe, eb_tree, &eb->eb_tree) {
+               nlen = len + sizeof(bareq);
+               if (nlen > baconf->ifbac_len) {
+                       break;
+}
+
+               strlcpy(bareq.ifba_name, eb->eb_name,
+                   sizeof(bareq.ifba_name));
+               eb_port_ifname(eb,
+                   bareq.ifba_ifsname, sizeof(bareq.ifba_ifsname),
+                   ebe->ebe_port);
+               memcpy(&bareq.ifba_dst, &ebe->ebe_addr,
+                   sizeof(bareq.ifba_dst));
+
+               memset(&bareq.ifba_dstsa, 0, sizeof(bareq.ifba_dstsa));
+               eb_port_sa(eb, &bareq.ifba_dstsa, ebe->ebe_port);
+
+               switch (ebe->ebe_type) {
+               case EBE_DYNAMIC:
+                       age = now - ebe->ebe_age;
+                       bareq.ifba_age = MIN(age, 0xff);
+                       bareq.ifba_flags = IFBAF_DYNAMIC;
+                       break;
+               case EBE_STATIC:
+                       bareq.ifba_age = 0;
+                       bareq.ifba_flags = IFBAF_STATIC;
+                       break;
+               }
+
+               memcpy(buf + len, &bareq, sizeof(bareq));
+                len = nlen;
+        }
+       nlen = baconf->ifbac_len;
+       baconf->ifbac_len = eb->eb_num * sizeof(bareq);
+       mtx_leave(&eb->eb_lock);
+
+       error = copyout(buf, baconf->ifbac_buf, len);
+       free(buf, M_TEMP, nlen);
+
+        return (error);
+}
+
+int
+etherbridge_set_max(struct etherbridge *eb, struct ifbrparam *bparam)
+{
+       if (bparam->ifbrp_csize < 1)
+               return (EINVAL);
+
+       /* commit */
+       eb->eb_max = bparam->ifbrp_csize;
+
+       return (0);
+}
+
+int
+etherbridge_get_max(struct etherbridge *eb, struct ifbrparam *bparam)
+{
+       bparam->ifbrp_csize = eb->eb_max;
+
+       return (0);
+}
+
+int
+etherbridge_set_tmo(struct etherbridge *eb, struct ifbrparam *bparam)
+{
+       if (bparam->ifbrp_ctime < 8 || bparam->ifbrp_ctime > 3600)
+               return (EINVAL);
+
+       /* commit */
+       eb->eb_max_age = bparam->ifbrp_ctime;
+
+       return (0);
+}
+
+int
+etherbridge_get_tmo(struct etherbridge *eb, struct ifbrparam *bparam)
+{
+       bparam->ifbrp_ctime = eb->eb_max_age;
+
+       return (0);
+}
Index: net/if_etherbridge.h
===================================================================
RCS file: net/if_etherbridge.h
diff -N net/if_etherbridge.h
--- /dev/null   1 Jan 1970 00:00:00 -0000
+++ net/if_etherbridge.h        10 Feb 2021 12:06:23 -0000
@@ -0,0 +1,103 @@
+/*     $OpenBSD$ */
+
+/*
+ * Copyright (c) 2018, 2021 David Gwynne <d...@openbsd.org>
+ *
+ * Permission to use, copy, modify, and distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
+
+#ifndef _NET_ETHERBRIDGE_H_
+#define _NET_ETHERBRIDGE_H_
+
+#define ETHERBRIDGE_TABLE_BITS         8
+#define ETHERBRIDGE_TABLE_SIZE         (1U << ETHERBRIDGE_TABLE_BITS)
+#define ETHERBRIDGE_TABLE_MASK         (ETHERBRIDGE_TABLE_SIZE - 1)
+
+struct etherbridge_ops {
+       int      (*eb_op_port_eq)(void *, void *, void *);
+       void    *(*eb_op_port_take)(void *, void *);
+       void     (*eb_op_port_rele)(void *, void *);
+       size_t   (*eb_op_port_ifname)(void *, char *, size_t, void *);
+       void     (*eb_op_port_sa)(void *, struct sockaddr_storage *, void *);
+};
+
+struct etherbridge;
+
+struct eb_entry {
+       SMR_TAILQ_ENTRY(eb_entry)        ebe_lentry;
+       union {
+               RBT_ENTRY(eb_entry)      _ebe_tentry;
+               TAILQ_ENTRY(eb_entry)    _ebe_qentry;
+       }                                _ebe_entries;
+#define ebe_tentry     _ebe_entries._ebe_tentry
+#define ebe_qentry     _ebe_entries._ebe_qentry
+
+       struct ether_addr                ebe_addr;
+       void                            *ebe_port;
+       unsigned int                     ebe_type;
+#define EBE_DYNAMIC                            0x0
+#define EBE_STATIC                             0x1
+#define EBE_DEAD                               0xdead
+       time_t                           ebe_age;
+
+       struct etherbridge              *ebe_etherbridge;
+       struct refcnt                    ebe_refs;
+       struct smr_entry                 ebe_smr_entry;
+};
+
+SMR_TAILQ_HEAD(eb_list, eb_entry);
+RBT_HEAD(eb_tree, eb_entry);
+TAILQ_HEAD(eb_queue, eb_entry);
+
+struct etherbridge {
+       const char                      *eb_name;
+       const struct etherbridge_ops    *eb_ops;
+       void                            *eb_cookie;
+
+       struct mutex                     eb_lock;
+       unsigned int                     eb_num;
+       unsigned int                     eb_max;
+       int                              eb_max_age; /* seconds */
+       struct timeout                   eb_tmo_age;
+
+       struct eb_list                  *eb_table;
+       struct eb_tree                   eb_tree;
+
+};
+
+int     etherbridge_init(struct etherbridge *, const char *,
+            const struct etherbridge_ops *, void *);
+int     etherbridge_up(struct etherbridge *);
+int     etherbridge_down(struct etherbridge *);
+void    etherbridge_destroy(struct etherbridge *);
+
+void    etherbridge_map(struct etherbridge *, void *,
+           const struct ether_addr *);
+void   *etherbridge_resolve(struct etherbridge *, const struct ether_addr *);
+void    etherbridge_detach_port(struct etherbridge *, void *);
+
+/* ioctl support */
+int     etherbridge_set_max(struct etherbridge *, struct ifbrparam *);
+int     etherbridge_get_max(struct etherbridge *, struct ifbrparam *);
+int     etherbridge_set_tmo(struct etherbridge *, struct ifbrparam *);
+int     etherbridge_get_tmo(struct etherbridge *, struct ifbrparam *);
+int     etherbridge_rtfind(struct etherbridge *, struct ifbaconf *);
+void    etherbridge_flush(struct etherbridge *, uint32_t);
+
+static inline unsigned int
+etherbridge_num(const struct etherbridge *eb)
+{
+       return (eb->eb_num);
+}
+
+#endif /* _NET_ETHERBRIDGE_H_ */
Index: net/if_gre.c
===================================================================
RCS file: /cvs/src/sys/net/if_gre.c,v
retrieving revision 1.164
diff -u -p -r1.164 if_gre.c
--- net/if_gre.c        19 Jan 2021 07:31:47 -0000      1.164
+++ net/if_gre.c        10 Feb 2021 12:06:23 -0000
@@ -99,6 +99,7 @@
 /* for nvgre bridge shizz */
 #include <sys/socket.h>
 #include <net/if_bridge.h>
+#include <net/if_etherbridge.h>
 
 /*
  * packet formats
@@ -395,27 +396,6 @@ struct egre_tree egre_tree = RBT_INITIAL
  * Network Virtualisation Using Generic Routing Encapsulation (NVGRE)
  */
 
-#define NVGRE_AGE_TMO          100     /* seconds */
-
-struct nvgre_entry {
-       RB_ENTRY(nvgre_entry)    nv_entry;
-       struct ether_addr        nv_dst;
-       uint8_t                  nv_type;
-#define NVGRE_ENTRY_DYNAMIC            0
-#define NVGRE_ENTRY_STATIC             1
-       union gre_addr           nv_gateway;
-       struct refcnt            nv_refs;
-       int                      nv_age;
-};
-
-RBT_HEAD(nvgre_map, nvgre_entry);
-
-static inline int
-               nvgre_entry_cmp(const struct nvgre_entry *,
-                   const struct nvgre_entry *);
-
-RBT_PROTOTYPE(nvgre_map, nvgre_entry, nv_entry, nvgre_entry_cmp);
-
 struct nvgre_softc {
        struct gre_tunnel        sc_tunnel; /* must be first */
        unsigned int             sc_ifp0;
@@ -432,12 +412,7 @@ struct nvgre_softc {
        struct task              sc_ltask;
        struct task              sc_dtask;
 
-       struct rwlock            sc_ether_lock;
-       struct nvgre_map         sc_ether_map;
-       unsigned int             sc_ether_num;
-       unsigned int             sc_ether_max;
-       int                      sc_ether_tmo;
-       struct timeout           sc_ether_age;
+       struct etherbridge       sc_eb;
 };
 
 RBT_HEAD(nvgre_ucast_tree, nvgre_softc);
@@ -474,16 +449,24 @@ static int        nvgre_input(const struct gre_
                    uint8_t);
 static void    nvgre_send(void *);
 
-static int     nvgre_rtfind(struct nvgre_softc *, struct ifbaconf *);
-static void    nvgre_flush_map(struct nvgre_softc *);
-static void    nvgre_input_map(struct nvgre_softc *,
-                   const struct gre_tunnel *, const struct ether_header *);
-static void    nvgre_age(void *);
+static int      nvgre_eb_port_eq(void *, void *, void *);
+static void    *nvgre_eb_port_take(void *, void *);
+static void     nvgre_eb_port_rele(void *, void *);
+static size_t   nvgre_eb_port_ifname(void *, char *, size_t, void *);
+static void     nvgre_eb_port_sa(void *, struct sockaddr_storage *, void *);
+
+static const struct etherbridge_ops nvgre_etherbridge_ops = {
+       nvgre_eb_port_eq,
+       nvgre_eb_port_take,
+       nvgre_eb_port_rele,
+       nvgre_eb_port_ifname,
+       nvgre_eb_port_sa,
+};
 
 struct if_clone nvgre_cloner =
     IF_CLONE_INITIALIZER("nvgre", nvgre_clone_create, nvgre_clone_destroy);
 
-struct pool nvgre_pool;
+struct pool nvgre_endpoint_pool;
 
 /* protected by NET_LOCK */
 struct nvgre_ucast_tree nvgre_ucast_tree = RBT_INITIALIZER();
@@ -759,10 +742,11 @@ nvgre_clone_create(struct if_clone *ifc,
        struct nvgre_softc *sc;
        struct ifnet *ifp;
        struct gre_tunnel *tunnel;
+       int error;
 
-       if (nvgre_pool.pr_size == 0) {
-               pool_init(&nvgre_pool, sizeof(struct nvgre_entry), 0,
-                   IPL_SOFTNET, 0, "nvgren", NULL);
+       if (nvgre_endpoint_pool.pr_size == 0) {
+               pool_init(&nvgre_endpoint_pool, sizeof(union gre_addr),
+                   0, IPL_SOFTNET, 0, "nvgreep", NULL);
        }
 
        sc = malloc(sizeof(*sc), M_DEVBUF, M_WAITOK|M_ZERO);
@@ -771,6 +755,13 @@ nvgre_clone_create(struct if_clone *ifc,
        snprintf(ifp->if_xname, sizeof(ifp->if_xname), "%s%d",
            ifc->ifc_name, unit);
 
+       error = etherbridge_init(&sc->sc_eb, ifp->if_xname,
+           &nvgre_etherbridge_ops, sc);
+       if (error != 0) {
+               free(sc, M_DEVBUF, sizeof(*sc));
+               return (error);
+       }
+
        ifp->if_softc = sc;
        ifp->if_hardmtu = ETHER_MAX_HARDMTU_LEN;
        ifp->if_ioctl = nvgre_ioctl;
@@ -793,13 +784,6 @@ nvgre_clone_create(struct if_clone *ifc,
        task_set(&sc->sc_ltask, nvgre_link_change, sc);
        task_set(&sc->sc_dtask, nvgre_detach, sc);
 
-       rw_init(&sc->sc_ether_lock, "nvgrelk");
-       RBT_INIT(nvgre_map, &sc->sc_ether_map);
-       sc->sc_ether_num = 0;
-       sc->sc_ether_max = 100;
-       sc->sc_ether_tmo = 240 * hz;
-       timeout_set_proc(&sc->sc_ether_age, nvgre_age, sc); /* ugh */
-
        ifmedia_init(&sc->sc_media, 0, egre_media_change, egre_media_status);
        ifmedia_add(&sc->sc_media, IFM_ETHER | IFM_AUTO, 0, NULL);
        ifmedia_set(&sc->sc_media, IFM_ETHER | IFM_AUTO);
@@ -821,6 +805,8 @@ nvgre_clone_destroy(struct ifnet *ifp)
                nvgre_down(sc);
        NET_UNLOCK();
 
+       etherbridge_destroy(&sc->sc_eb);
+
        ifmedia_delete_instance(&sc->sc_media, IFM_INST_ANY);
        ether_ifdetach(ifp);
        if_detach(ifp);
@@ -1344,183 +1330,6 @@ egre_input(const struct gre_tunnel *key,
        return (0);
 }
 
-static int
-nvgre_rtfind(struct nvgre_softc *sc, struct ifbaconf *baconf)
-{
-       struct ifnet *ifp = &sc->sc_ac.ac_if;
-       struct nvgre_entry *nv;
-       struct ifbareq bareq;
-       caddr_t uaddr, end;
-       int error;
-       int age;
-
-       if (baconf->ifbac_len == 0) {
-               /* single read is atomic */
-               baconf->ifbac_len = sc->sc_ether_num * sizeof(bareq);
-               return (0);
-       }
-
-       uaddr = baconf->ifbac_buf;
-       end = uaddr + baconf->ifbac_len;
-
-       rw_enter_read(&sc->sc_ether_lock);
-       RBT_FOREACH(nv, nvgre_map, &sc->sc_ether_map) {
-               if (uaddr >= end)
-                       break;
-
-               memcpy(bareq.ifba_name, ifp->if_xname,
-                   sizeof(bareq.ifba_name));
-               memcpy(bareq.ifba_ifsname, ifp->if_xname,
-                   sizeof(bareq.ifba_ifsname));
-               memcpy(&bareq.ifba_dst, &nv->nv_dst,
-                   sizeof(bareq.ifba_dst));
-
-               memset(&bareq.ifba_dstsa, 0, sizeof(bareq.ifba_dstsa));
-               switch (sc->sc_tunnel.t_af) {
-               case AF_INET: {
-                       struct sockaddr_in *sin;
-
-                       sin = (struct sockaddr_in *)&bareq.ifba_dstsa;
-                       sin->sin_len = sizeof(*sin);
-                       sin->sin_family = AF_INET;
-                       sin->sin_addr = nv->nv_gateway.in4;
-
-                       break;
-               }
-#ifdef INET6
-               case AF_INET6: {
-                       struct sockaddr_in6 *sin6;
-
-                       sin6 = (struct sockaddr_in6 *)&bareq.ifba_dstsa;
-                       sin6->sin6_len = sizeof(*sin6);
-                       sin6->sin6_family = AF_INET6;
-                       sin6->sin6_addr = nv->nv_gateway.in6;
-
-                       break;
-               }
-#endif /* INET6 */
-               default:
-                       unhandled_af(sc->sc_tunnel.t_af);
-               }
-
-               switch (nv->nv_type) {
-               case NVGRE_ENTRY_DYNAMIC:
-                       age = (ticks - nv->nv_age) / hz;
-                       bareq.ifba_age = MIN(age, 0xff);
-                       bareq.ifba_flags = IFBAF_DYNAMIC;
-                       break;
-               case NVGRE_ENTRY_STATIC:
-                       bareq.ifba_age = 0;
-                       bareq.ifba_flags = IFBAF_STATIC;
-                       break;
-               }
-
-               error = copyout(&bareq, uaddr, sizeof(bareq));
-               if (error != 0) {
-                       rw_exit_read(&sc->sc_ether_lock);
-                       return (error);
-               }
-
-               uaddr += sizeof(bareq);
-       }
-       baconf->ifbac_len = sc->sc_ether_num * sizeof(bareq);
-       rw_exit_read(&sc->sc_ether_lock);
-
-       return (0);
-}
-
-static void
-nvgre_flush_map(struct nvgre_softc *sc)
-{
-       struct nvgre_map map;
-       struct nvgre_entry *nv, *nnv;
-
-       rw_enter_write(&sc->sc_ether_lock);
-       map = sc->sc_ether_map;
-       RBT_INIT(nvgre_map, &sc->sc_ether_map);
-       sc->sc_ether_num = 0;
-       rw_exit_write(&sc->sc_ether_lock);
-
-       RBT_FOREACH_SAFE(nv, nvgre_map, &map, nnv) {
-               RBT_REMOVE(nvgre_map, &map, nv);
-               if (refcnt_rele(&nv->nv_refs))
-                       pool_put(&nvgre_pool, nv);
-       }
-}
-
-static void
-nvgre_input_map(struct nvgre_softc *sc, const struct gre_tunnel *key,
-    const struct ether_header *eh)
-{
-       struct nvgre_entry *nv, nkey;
-       int new = 0;
-
-       if (ETHER_IS_BROADCAST(eh->ether_shost) ||
-           ETHER_IS_MULTICAST(eh->ether_shost))
-               return;
-
-       memcpy(&nkey.nv_dst, eh->ether_shost, ETHER_ADDR_LEN);
-
-       /* remember where it came from */
-       rw_enter_read(&sc->sc_ether_lock);
-       nv = RBT_FIND(nvgre_map, &sc->sc_ether_map, &nkey);
-       if (nv == NULL)
-               new = 1;
-       else {
-               nv->nv_age = ticks;
-
-               if (nv->nv_type != NVGRE_ENTRY_DYNAMIC ||
-                   gre_ip_cmp(key->t_af, &key->t_dst, &nv->nv_gateway) == 0)
-                       nv = NULL;
-               else
-                       refcnt_take(&nv->nv_refs);
-       }
-       rw_exit_read(&sc->sc_ether_lock);
-
-       if (new) {
-               struct nvgre_entry *onv;
-               unsigned int num;
-
-               nv = pool_get(&nvgre_pool, PR_NOWAIT);
-               if (nv == NULL) {
-                       /* oh well */
-                       return;
-               }
-
-               memcpy(&nv->nv_dst, eh->ether_shost, ETHER_ADDR_LEN);
-               nv->nv_type = NVGRE_ENTRY_DYNAMIC;
-               nv->nv_gateway = key->t_dst;
-               refcnt_init(&nv->nv_refs);
-               nv->nv_age = ticks;
-
-               rw_enter_write(&sc->sc_ether_lock);
-               num = sc->sc_ether_num;
-               if (++num > sc->sc_ether_max)
-                       onv = nv;
-               else {
-                       /* try to give the ref to the map */
-                       onv = RBT_INSERT(nvgre_map, &sc->sc_ether_map, nv);
-                       if (onv == NULL) {
-                               /* count the successful insert */
-                               sc->sc_ether_num = num;
-                       }
-               }
-               rw_exit_write(&sc->sc_ether_lock);
-
-               if (onv != NULL)
-                       pool_put(&nvgre_pool, nv);
-       } else if (nv != NULL) {
-               rw_enter_write(&sc->sc_ether_lock);
-               nv->nv_gateway = key->t_dst;
-               rw_exit_write(&sc->sc_ether_lock);
-
-               if (refcnt_rele(&nv->nv_refs)) {
-                       /* ioctl may have deleted the entry */
-                       pool_put(&nvgre_pool, nv);
-               }
-       }
-}
-
 static inline struct nvgre_softc *
 nvgre_mcast_find(const struct gre_tunnel *key, unsigned int if0idx)
 {
@@ -1562,6 +1371,7 @@ nvgre_input(const struct gre_tunnel *key
     uint8_t otos)
 {
        struct nvgre_softc *sc;
+       struct ether_header *eh;
 
        if (ISSET(m->m_flags, M_MCAST|M_BCAST))
                sc = nvgre_mcast_find(key, m->m_pkthdr.ph_ifidx);
@@ -1576,7 +1386,9 @@ nvgre_input(const struct gre_tunnel *key
        if (m == NULL)
                return (0);
 
-       nvgre_input_map(sc, key, mtod(m, struct ether_header *));
+       eh = mtod(m, struct ether_header *);
+       etherbridge_map(&sc->sc_eb, (void *)&key->t_dst,
+           (struct ether_addr *)eh->ether_shost);
 
        SET(m->m_pkthdr.csum_flags, M_FLOWID);
        m->m_pkthdr.ph_flowid = bemtoh32(&key->t_key) & ~GRE_KEY_ENTROPY;
@@ -2768,7 +2580,7 @@ nvgre_ioctl(struct ifnet *ifp, u_long cm
                }
                error = gre_set_tunnel(tunnel, (struct if_laddrreq *)data, 0);
                if (error == 0)
-                       nvgre_flush_map(sc);
+                       etherbridge_flush(&sc->sc_eb, IFBF_FLUSHALL);
                break;
        case SIOCGLIFPHYADDR:
                error = gre_get_tunnel(tunnel, (struct if_laddrreq *)data);
@@ -2780,7 +2592,7 @@ nvgre_ioctl(struct ifnet *ifp, u_long cm
                }
                error = gre_del_tunnel(tunnel);
                if (error == 0)
-                       nvgre_flush_map(sc);
+                       etherbridge_flush(&sc->sc_eb, IFBF_FLUSHALL);
                break;
 
        case SIOCSIFPARENT:
@@ -2790,7 +2602,7 @@ nvgre_ioctl(struct ifnet *ifp, u_long cm
                }
                error = nvgre_set_parent(sc, parent->ifp_parent);
                if (error == 0)
-                       nvgre_flush_map(sc);
+                       etherbridge_flush(&sc->sc_eb, IFBF_FLUSHALL);
                break;
        case SIOCGIFPARENT:
                ifp0 = if_get(sc->sc_ifp0);
@@ -2809,7 +2621,7 @@ nvgre_ioctl(struct ifnet *ifp, u_long cm
                }
                /* commit */
                sc->sc_ifp0 = 0;
-               nvgre_flush_map(sc);
+               etherbridge_flush(&sc->sc_eb, IFBF_FLUSHALL);
                break;
 
        case SIOCSVNETID:
@@ -2825,7 +2637,7 @@ nvgre_ioctl(struct ifnet *ifp, u_long cm
 
                /* commit */
                tunnel->t_key = htonl(ifr->ifr_vnetid << GRE_KEY_ENTROPY_SHIFT);
-               nvgre_flush_map(sc);
+               etherbridge_flush(&sc->sc_eb, IFBF_FLUSHALL);
                break;
        case SIOCGVNETID:
                error = gre_get_vnetid(tunnel, ifr);
@@ -2839,7 +2651,7 @@ nvgre_ioctl(struct ifnet *ifp, u_long cm
                        break;
                }
                tunnel->t_rtableid = ifr->ifr_rdomainid;
-               nvgre_flush_map(sc);
+               etherbridge_flush(&sc->sc_eb, IFBF_FLUSHALL);
                break;
        case SIOCGLIFPHYRTABLE:
                ifr->ifr_rdomainid = tunnel->t_rtableid;
@@ -2890,35 +2702,26 @@ nvgre_ioctl(struct ifnet *ifp, u_long cm
                break;
 
        case SIOCBRDGSCACHE:
-               if (bparam->ifbrp_csize < 1) {
-                       error = EINVAL;
-                       break;
-               }
-
-               /* commit */
-               sc->sc_ether_max = bparam->ifbrp_csize;
+               error = etherbridge_set_max(&sc->sc_eb, bparam);
                break;
        case SIOCBRDGGCACHE:
-               bparam->ifbrp_csize = sc->sc_ether_max;
+               error = etherbridge_get_max(&sc->sc_eb, bparam);
                break;
 
        case SIOCBRDGSTO:
-               if (bparam->ifbrp_ctime < 0 ||
-                   bparam->ifbrp_ctime > INT_MAX / hz) {
-                       error = EINVAL;
-                       break;
-               }
-               sc->sc_ether_tmo = bparam->ifbrp_ctime * hz;
+               error = etherbridge_set_tmo(&sc->sc_eb, bparam);
                break;
        case SIOCBRDGGTO:
-               bparam->ifbrp_ctime = sc->sc_ether_tmo / hz;
+               error = etherbridge_get_tmo(&sc->sc_eb, bparam);
                break;
 
        case SIOCBRDGRTS:
-               error = nvgre_rtfind(sc, (struct ifbaconf *)data);
+               error = etherbridge_rtfind(&sc->sc_eb,
+                   (struct ifbaconf *)data);
                break;
        case SIOCBRDGFLUSH:
-               nvgre_flush_map(sc);
+               etherbridge_flush(&sc->sc_eb,
+                   ((struct ifbreq *)data)->ifbr_ifsflags);
                break;
 
        case SIOCADDMULTI:
@@ -3667,8 +3470,6 @@ nvgre_up(struct nvgre_softc *sc)
        sc->sc_inm = inm;
        SET(sc->sc_ac.ac_if.if_flags, IFF_RUNNING);
 
-       timeout_add_sec(&sc->sc_ether_age, NVGRE_AGE_TMO);
-
        return (0);
 
 remove_ucast:
@@ -3693,7 +3494,6 @@ nvgre_down(struct nvgre_softc *sc)
        CLR(ifp->if_flags, IFF_RUNNING);
 
        NET_UNLOCK();
-       timeout_del_barrier(&sc->sc_ether_age);
        ifq_barrier(&ifp->if_snd);
        if (!task_del(softnet, &sc->sc_send_task))
                taskq_barrier(softnet);
@@ -3770,60 +3570,11 @@ nvgre_set_parent(struct nvgre_softc *sc,
 }
 
 static void
-nvgre_age(void *arg)
-{
-       struct nvgre_softc *sc = arg;
-       struct nvgre_entry *nv, *nnv;
-       int tmo = sc->sc_ether_tmo * 2;
-       int diff;
-
-       if (!ISSET(sc->sc_ac.ac_if.if_flags, IFF_RUNNING))
-               return;
-
-       rw_enter_write(&sc->sc_ether_lock); /* XXX */
-       RBT_FOREACH_SAFE(nv, nvgre_map, &sc->sc_ether_map, nnv) {
-               if (nv->nv_type != NVGRE_ENTRY_DYNAMIC)
-                       continue;
-
-               diff = ticks - nv->nv_age;
-               if (diff < tmo)
-                       continue;
-
-               sc->sc_ether_num--;
-               RBT_REMOVE(nvgre_map, &sc->sc_ether_map, nv);
-               if (refcnt_rele(&nv->nv_refs))
-                       pool_put(&nvgre_pool, nv);
-       }
-       rw_exit_write(&sc->sc_ether_lock);
-
-       timeout_add_sec(&sc->sc_ether_age, NVGRE_AGE_TMO);
-}
-
-static inline int
-nvgre_entry_valid(struct nvgre_softc *sc, const struct nvgre_entry *nv)
-{
-       int diff;
-
-       if (nv == NULL)
-               return (0);
-
-       if (nv->nv_type == NVGRE_ENTRY_STATIC)
-               return (1);
-
-       diff = ticks - nv->nv_age;
-       if (diff < sc->sc_ether_tmo)
-               return (1);
-
-       return (0);
-}
-
-static void
 nvgre_start(struct ifnet *ifp)
 {
        struct nvgre_softc *sc = ifp->if_softc;
        const struct gre_tunnel *tunnel = &sc->sc_tunnel;
        union gre_addr gateway;
-       struct nvgre_entry *nv, key;
        struct mbuf_list ml = MBUF_LIST_INITIALIZER();
        struct ether_header *eh;
        struct mbuf *m, *m0;
@@ -3847,18 +3598,17 @@ nvgre_start(struct ifnet *ifp)
                if (ETHER_IS_BROADCAST(eh->ether_dhost))
                        gateway = tunnel->t_dst;
                else {
-                       memcpy(&key.nv_dst, eh->ether_dhost,
-                           sizeof(key.nv_dst));
+                       const union gre_addr *endpoint;
 
-                       rw_enter_read(&sc->sc_ether_lock);
-                       nv = RBT_FIND(nvgre_map, &sc->sc_ether_map, &key);
-                       if (nvgre_entry_valid(sc, nv))
-                               gateway = nv->nv_gateway;
-                       else {
+                       smr_read_enter();
+                       endpoint = etherbridge_resolve(&sc->sc_eb,
+                           (struct ether_addr *)eh->ether_dhost);
+                       if (endpoint == NULL) {
                                /* "flood" to unknown hosts */
-                               gateway = tunnel->t_dst;
+                               endpoint = &tunnel->t_dst;
                        }
-                       rw_exit_read(&sc->sc_ether_lock);
+                       gateway = *endpoint;
+                       smr_read_leave();
                }
 
                /* force prepend mbuf because of alignment problems */
@@ -4346,14 +4096,6 @@ egre_cmp(const struct egre_softc *a, con
 
 RBT_GENERATE(egre_tree, egre_softc, sc_entry, egre_cmp);
 
-static inline int
-nvgre_entry_cmp(const struct nvgre_entry *a, const struct nvgre_entry *b)
-{
-       return (memcmp(&a->nv_dst, &b->nv_dst, sizeof(a->nv_dst)));
-}
-
-RBT_GENERATE(nvgre_map, nvgre_entry, nv_entry, nvgre_entry_cmp);
-
 static int
 nvgre_cmp_tunnel(const struct gre_tunnel *a, const struct gre_tunnel *b)
 {
@@ -4473,3 +4215,73 @@ eoip_cmp(const struct eoip_softc *ea, co
 }
 
 RBT_GENERATE(eoip_tree, eoip_softc, sc_entry, eoip_cmp);
+
+static int
+nvgre_eb_port_eq(void *arg, void *a, void *b)
+{
+       struct nvgre_softc *sc = arg;
+
+       return (gre_ip_cmp(sc->sc_tunnel.t_af, a, b) == 0);
+}
+
+static void *
+nvgre_eb_port_take(void *arg, void *port)
+{
+       union gre_addr *ea = port;
+       union gre_addr *endpoint;
+
+       endpoint = pool_get(&nvgre_endpoint_pool, PR_NOWAIT);
+       if (endpoint == NULL)
+               return (NULL);
+
+       *endpoint = *ea;
+
+       return (endpoint);
+}
+
+static void
+nvgre_eb_port_rele(void *arg, void *port)
+{
+       union gre_addr *endpoint = port;
+
+       pool_put(&nvgre_endpoint_pool, endpoint);
+}
+
+static size_t
+nvgre_eb_port_ifname(void *arg, char *dst, size_t len, void *port)
+{
+       struct nvgre_softc *sc = arg;
+
+       return (strlcpy(dst, sc->sc_ac.ac_if.if_xname, len));
+}
+
+static void
+nvgre_eb_port_sa(void *arg, struct sockaddr_storage *ss, void *port)
+{
+       struct nvgre_softc *sc = arg;
+       union gre_addr *endpoint = port;
+
+       switch (sc->sc_tunnel.t_af) {
+       case AF_INET: {
+               struct sockaddr_in *sin = (struct sockaddr_in *)ss;
+
+               sin->sin_len = sizeof(*sin);
+               sin->sin_family = AF_INET;
+               sin->sin_addr = endpoint->in4;
+               break;
+       }
+#ifdef INET6
+       case AF_INET6: {
+               struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)ss;
+
+               sin6->sin6_len = sizeof(*sin6);
+               sin6->sin6_family = AF_INET6;
+               sin6->sin6_addr = endpoint->in6;
+
+               break;
+       }
+#endif /* INET6 */
+       default:
+               unhandled_af(sc->sc_tunnel.t_af);
+       }
+}
Index: net/if_veb.c
===================================================================
RCS file: net/if_veb.c
diff -N net/if_veb.c
--- /dev/null   1 Jan 1970 00:00:00 -0000
+++ net/if_veb.c        10 Feb 2021 12:06:23 -0000
@@ -0,0 +1,1747 @@
+/*     $OpenBSD$ */
+
+/*
+ * Copyright (c) 2021 David Gwynne <d...@openbsd.org>
+ *
+ * Permission to use, copy, modify, and distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
+
+#include "bpfilter.h"
+#include "pf.h"
+#include "vlan.h"
+
+#include <sys/param.h>
+#include <sys/kernel.h>
+#include <sys/malloc.h>
+#include <sys/mbuf.h>
+#include <sys/queue.h>
+#include <sys/socket.h>
+#include <sys/sockio.h>
+#include <sys/systm.h>
+#include <sys/syslog.h>
+#include <sys/rwlock.h>
+#include <sys/percpu.h>
+#include <sys/smr.h>
+#include <sys/task.h>
+#include <sys/pool.h>
+
+#include <net/if.h>
+#include <net/if_dl.h>
+#include <net/if_types.h>
+
+#include <netinet/in.h>
+#include <netinet/if_ether.h>
+
+#include <net/if_bridge.h>
+#include <net/if_etherbridge.h>
+
+#if NBPFILTER > 0
+#include <net/bpf.h>
+#endif
+
+#if NPF > 0
+#include <net/pfvar.h>
+#endif
+
+#if NVLAN > 0
+#include <net/if_vlan_var.h>
+#endif
+
+struct veb_rule {
+       TAILQ_ENTRY(veb_rule)           vr_entry;
+       SMR_TAILQ_ENTRY(veb_rule)       vr_lentry[2];
+
+       uint16_t                        vr_flags;
+#define VEB_R_F_IN                             (1U << 0)
+#define VEB_R_F_OUT                            (1U << 1)
+#define VEB_R_F_SRC                            (1U << 2)
+#define VEB_R_F_DST                            (1U << 3)
+
+#define VEB_R_F_ARP                            (1U << 4)
+#define VEB_R_F_RARP                           (1U << 5)
+#define VEB_R_F_SHA                            (1U << 6)
+#define VEB_R_F_SPA                            (1U << 7)
+#define VEB_R_F_THA                            (1U << 8)
+#define VEB_R_F_TPA                            (1U << 9)
+       uint16_t                         vr_arp_op;
+
+       struct ether_addr                vr_src;
+       struct ether_addr                vr_dst;
+       struct ether_addr                vr_arp_sha;
+       struct ether_addr                vr_arp_tha;
+       struct in_addr                   vr_arp_spa;
+       struct in_addr                   vr_arp_tpa;
+
+       unsigned int                     vr_action;
+#define VEB_R_MATCH                            0
+#define VEB_R_PASS                             1
+#define VEB_R_BLOCK                            2
+
+       int                              vr_pftag;
+};
+
+TAILQ_HEAD(veb_rules, veb_rule);
+SMR_TAILQ_HEAD(veb_rule_list, veb_rule);
+
+struct veb_softc;
+
+struct veb_port {
+       struct ifnet                    *p_ifp0;
+       struct refcnt                    p_refs;
+
+       int (*p_ioctl)(struct ifnet *, u_long, caddr_t);
+       int (*p_output)(struct ifnet *, struct mbuf *, struct sockaddr *,
+           struct rtentry *);
+
+       struct task                      p_ltask;
+       struct task                      p_dtask;
+
+       struct veb_softc                *p_veb;
+
+       struct ether_brport              p_brport;
+
+       unsigned int                     p_link_state;
+       unsigned int                     p_span;
+       unsigned int                     p_bif_flags;
+       uint32_t                         p_protected;
+
+       struct veb_rules                 p_vrl;
+       unsigned int                     p_nvrl;
+       struct veb_rule_list             p_vr_list[2];
+#define VEB_RULE_LIST_OUT                      0
+#define VEB_RULE_LIST_IN                       1
+
+       SMR_TAILQ_ENTRY(veb_port)        p_entry;
+};
+
+struct veb_ports {
+       SMR_TAILQ_HEAD(, veb_port)       l_list;
+       unsigned int                     l_count;
+};
+
+struct veb_softc {
+       struct ifnet                     sc_if;
+       unsigned int                     sc_dead;
+
+       struct etherbridge               sc_eb;
+
+       struct rwlock                    sc_rule_lock;
+       struct veb_ports                 sc_ports;
+       struct veb_ports                 sc_spans;
+};
+
+#define DPRINTF(_sc, fmt...)    do { \
+       if (ISSET((_sc)->sc_if.if_flags, IFF_DEBUG)) \
+               printf(fmt); \
+} while (0)
+
+
+static int     veb_clone_create(struct if_clone *, int);
+static int     veb_clone_destroy(struct ifnet *);
+
+static int     veb_ioctl(struct ifnet *, u_long, caddr_t);
+static void    veb_input(struct ifnet *, struct mbuf *);
+static int     veb_enqueue(struct ifnet *, struct mbuf *);
+static int     veb_output(struct ifnet *, struct mbuf *, struct sockaddr *,
+                   struct rtentry *);
+static void    veb_start(struct ifqueue *);
+
+static int     veb_up(struct veb_softc *);
+static int     veb_down(struct veb_softc *);
+static int     veb_iff(struct veb_softc *);
+
+static void    veb_p_linkch(void *);
+static void    veb_p_detach(void *);
+static int     veb_p_ioctl(struct ifnet *, u_long, caddr_t);
+static int     veb_p_output(struct ifnet *, struct mbuf *,
+                   struct sockaddr *, struct rtentry *);
+
+static void    veb_p_dtor(struct veb_softc *, struct veb_port *,
+                   const char *);
+static int     veb_add_port(struct veb_softc *,
+                   const struct ifbreq *, unsigned int);
+static int     veb_del_port(struct veb_softc *,
+                   const struct ifbreq *, unsigned int);
+static int     veb_port_list(struct veb_softc *, struct ifbifconf *);
+static int     veb_port_set_protected(struct veb_softc *,
+                   const struct ifbreq *);
+
+static int     veb_rule_add(struct veb_softc *, const struct ifbrlreq *);
+static int     veb_rule_list_flush(struct veb_softc *,
+                   const struct ifbrlreq *);
+static void    veb_rule_list_free(struct veb_rule *);
+static int     veb_rule_list_get(struct veb_softc *, struct ifbrlconf *);
+
+static int      veb_eb_port_cmp(void *, void *, void *);
+static void    *veb_eb_port_take(void *, void *);
+static void     veb_eb_port_rele(void *, void *);
+static size_t   veb_eb_port_ifname(void *, char *, size_t, void *);
+static void     veb_eb_port_sa(void *, struct sockaddr_storage *, void *);
+
+static const struct etherbridge_ops veb_etherbridge_ops = {
+       veb_eb_port_cmp,
+       veb_eb_port_take,
+       veb_eb_port_rele,
+       veb_eb_port_ifname,
+       veb_eb_port_sa,
+};
+
+static struct if_clone veb_cloner =
+    IF_CLONE_INITIALIZER("veb", veb_clone_create, veb_clone_destroy);
+
+static struct pool veb_rule_pool;
+
+static int     vport_clone_create(struct if_clone *, int);
+static int     vport_clone_destroy(struct ifnet *);
+
+struct vport_softc {
+       struct arpcom            sc_ac;
+       unsigned int             sc_dead;
+};
+
+static int     vport_ioctl(struct ifnet *, u_long, caddr_t);
+static int     vport_enqueue(struct ifnet *, struct mbuf *);
+static void    vport_start(struct ifqueue *);
+
+static int     vport_up(struct vport_softc *);
+static int     vport_down(struct vport_softc *);
+static int     vport_iff(struct vport_softc *);
+
+static struct if_clone vport_cloner =
+    IF_CLONE_INITIALIZER("vport", vport_clone_create, vport_clone_destroy);
+
+void
+vebattach(int count)
+{
+       if_clone_attach(&veb_cloner);
+       if_clone_attach(&vport_cloner);
+}
+
+static int
+veb_clone_create(struct if_clone *ifc, int unit)
+{
+       struct veb_softc *sc;
+       struct ifnet *ifp;
+       int error;
+
+       if (veb_rule_pool.pr_size == 0) {
+               pool_init(&veb_rule_pool, sizeof(struct veb_rule),
+                   0, IPL_SOFTNET, 0, "vebrpl", NULL);
+       }
+
+       sc = malloc(sizeof(*sc), M_DEVBUF, M_WAITOK|M_ZERO|M_CANFAIL);
+       if (sc == NULL)
+               return (ENOMEM);
+
+       rw_init(&sc->sc_rule_lock, "vebrlk");
+       SMR_TAILQ_INIT(&sc->sc_ports.l_list);
+       SMR_TAILQ_INIT(&sc->sc_spans.l_list);
+
+       ifp = &sc->sc_if;
+
+       snprintf(ifp->if_xname, sizeof(ifp->if_xname), "%s%d",
+           ifc->ifc_name, unit);
+
+       error = etherbridge_init(&sc->sc_eb, ifp->if_xname,
+           &veb_etherbridge_ops, sc);
+       if (error != 0) {
+               free(sc, M_DEVBUF, sizeof(*sc));
+               return (error);
+       }
+
+       ifp->if_softc = sc;
+       ifp->if_type = IFT_BRIDGE;
+       ifp->if_hdrlen = ETHER_HDR_LEN;
+       ifp->if_hardmtu = ETHER_MAX_HARDMTU_LEN;
+       ifp->if_ioctl = veb_ioctl;
+       ifp->if_input = veb_input;
+       //ifp->if_rtrequest = veb_rtrequest;
+       ifp->if_output = veb_output;
+       ifp->if_enqueue = veb_enqueue;
+       ifp->if_qstart = veb_start;
+       ifp->if_flags = IFF_BROADCAST | IFF_SIMPLEX | IFF_MULTICAST;
+       ifp->if_xflags = IFXF_CLONED | IFXF_MPSAFE;
+
+       if_counters_alloc(ifp);
+       if_attach(ifp);
+
+       if_alloc_sadl(ifp);
+
+#if NBPFILTER > 0
+       bpfattach(&ifp->if_bpf, ifp, DLT_EN10MB, ETHER_HDR_LEN);
+#endif
+
+       return (0);
+}
+
+static int
+veb_clone_destroy(struct ifnet *ifp)
+{
+       struct veb_softc *sc = ifp->if_softc;
+       struct veb_port *p, *np;
+
+       NET_LOCK();
+       sc->sc_dead = 1;
+
+       if (ISSET(ifp->if_flags, IFF_RUNNING))
+               veb_down(sc);
+       NET_UNLOCK();
+
+       if_detach(ifp);
+
+       NET_LOCK();
+       SMR_TAILQ_FOREACH_SAFE_LOCKED(p, &sc->sc_ports.l_list, p_entry, np)
+               veb_p_dtor(sc, p, "destroy");
+       SMR_TAILQ_FOREACH_SAFE_LOCKED(p, &sc->sc_spans.l_list, p_entry, np)
+               veb_p_dtor(sc, p, "destroy");
+       NET_UNLOCK();
+
+       etherbridge_destroy(&sc->sc_eb);
+
+       free(sc, M_DEVBUF, sizeof(*sc));
+
+       return (0);
+}
+
+static struct mbuf *
+veb_span_input(struct ifnet *ifp0, struct mbuf *m, void *brport)
+{
+       m_freem(m);
+       return (NULL);
+}
+
+static void
+veb_span(struct veb_softc *sc, struct mbuf *m0)
+{
+       struct veb_port *p;
+       struct ifnet *ifp0;
+       struct mbuf *m;
+
+       smr_read_enter();
+       SMR_TAILQ_FOREACH(p, &sc->sc_spans.l_list, p_entry) {
+               ifp0 = p->p_ifp0;
+               if (!ISSET(ifp0->if_flags, IFF_RUNNING))
+                       continue;
+
+               m = m_dup_pkt(m0, max_linkhdr + ETHER_ALIGN, M_NOWAIT);
+               if (m == NULL) {
+                       /* XXX count error */
+                       continue;
+               }
+
+               if_enqueue(ifp0, m); /* XXX count error */
+       }
+       smr_read_leave();
+}
+
+static int
+veb_vlan_filter(const struct mbuf *m)
+{
+       const struct ether_header *eh;
+
+       eh = mtod(m, struct ether_header *);
+       switch (ntohs(eh->ether_type)) {
+       case ETHERTYPE_VLAN:
+       case ETHERTYPE_QINQ:
+               return (1);
+       default:
+               break;
+       }
+
+       return (0);
+}
+
+static int
+veb_rule_arp_match(const struct veb_rule *vr, struct mbuf *m)
+{
+       struct ether_header *eh;
+       struct ether_arp ea;
+
+       eh = mtod(m, struct ether_header *);
+
+       if (eh->ether_type != htons(ETHERTYPE_ARP))
+               return (0);
+       if (m->m_pkthdr.len < sizeof(*eh) + sizeof(ea))
+               return (0);
+
+       m_copydata(m, sizeof(*eh), sizeof(ea), (caddr_t)&ea);
+
+       if (ea.arp_hrd != htons(ARPHRD_ETHER) ||
+           ea.arp_pro != htons(ETHERTYPE_IP) ||
+           ea.arp_hln != ETHER_ADDR_LEN ||
+           ea.arp_pln != sizeof(struct in_addr))
+               return (0);
+
+       if (ISSET(vr->vr_flags, VEB_R_F_ARP)) {
+               if (ea.arp_op != htons(ARPOP_REQUEST) &&
+                   ea.arp_op != htons(ARPOP_REPLY))
+                       return (0);
+       }
+       if (ISSET(vr->vr_flags, VEB_R_F_RARP)) {
+               if (ea.arp_op != htons(ARPOP_REVREQUEST) &&
+                   ea.arp_op != htons(ARPOP_REVREPLY))
+                       return (0);
+       }
+
+       if (vr->vr_arp_op != htons(0) && vr->vr_arp_op != ea.arp_op)
+               return (0);
+
+       if (ISSET(vr->vr_flags, VEB_R_F_SHA) &&
+           !ETHER_IS_EQ(&vr->vr_arp_sha, ea.arp_sha))
+               return (0);
+       if (ISSET(vr->vr_flags, VEB_R_F_THA) &&
+           !ETHER_IS_EQ(&vr->vr_arp_tha, ea.arp_tha))
+               return (0);
+       if (ISSET(vr->vr_flags, VEB_R_F_SPA) &&
+           memcmp(&vr->vr_arp_spa, ea.arp_spa, sizeof(vr->vr_arp_spa)) != 0)
+               return (0);
+       if (ISSET(vr->vr_flags, VEB_R_F_TPA) &&
+           memcmp(&vr->vr_arp_tpa, ea.arp_tpa, sizeof(vr->vr_arp_tpa)) != 0)
+               return (0);
+
+       return (1);
+}
+
+static int
+veb_rule_list_test(struct veb_rule *vr, int dir, struct mbuf *m)
+{
+       struct ether_header *eh = mtod(m, struct ether_header *);
+
+       SMR_ASSERT_CRITICAL();
+
+       do {
+               if (ISSET(vr->vr_flags, VEB_R_F_ARP|VEB_R_F_RARP) &&
+                   !veb_rule_arp_match(vr, m))
+                       continue;
+
+               if (ISSET(vr->vr_flags, VEB_R_F_SRC) &&
+                   !ETHER_IS_EQ(&vr->vr_src, eh->ether_shost))
+                       continue;
+               if (ISSET(vr->vr_flags, VEB_R_F_DST) &&
+                   !ETHER_IS_EQ(&vr->vr_dst, eh->ether_dhost))
+                       continue;
+
+               if (vr->vr_action == VEB_R_BLOCK)
+                       return (VEB_R_BLOCK);
+#if NPF > 0
+               pf_tag_packet(m, vr->vr_pftag, -1);
+#endif
+               if (vr->vr_action == VEB_R_PASS)
+                       return (VEB_R_PASS);
+       } while ((vr = SMR_TAILQ_NEXT(vr, vr_lentry[dir])) != NULL);
+
+       return (VEB_R_PASS);
+}
+
+static inline int
+veb_rule_filter(struct veb_port *p, int dir, struct mbuf *m)
+{
+       struct veb_rule *vr;
+
+       vr = SMR_TAILQ_FIRST(&p->p_vr_list[dir]);
+       if (vr == NULL)
+               return (0);
+
+       return (veb_rule_list_test(vr, dir, m) == VEB_R_BLOCK);
+}
+
+#if NPF > 0
+static struct mbuf *
+veb_pf(struct ifnet *ifp0, int dir, struct mbuf *m)
+{
+       struct ether_header *eh, copy;
+       sa_family_t af = AF_UNSPEC;
+
+       /*
+        * pf runs on vport interfaces when they enter or leave the
+        * l3 stack, so don't confuse things (even more) by running
+        * pf again here. note that because of this exception the
+        * pf direction on vport interfaces is reversed compared to
+        * other veb ports.
+        */
+       if (ifp0->if_enqueue == vport_enqueue)
+               return (m);
+
+       eh = mtod(m, struct ether_header *);
+       switch (ntohs(eh->ether_type)) {
+       case ETHERTYPE_IP:
+               af = AF_INET;
+               break;
+       case ETHERTYPE_IPV6:
+               af = AF_INET6;
+               break;
+       default:
+               return (m);
+       }
+
+       copy = *eh;
+       m_adj(m, sizeof(*eh));
+
+       if (pf_test(af, dir, ifp0, &m) != PF_PASS) {
+               m_freem(m);
+               return (NULL);
+       }
+       if (m == NULL)
+               return (NULL);
+
+       m = m_prepend(m, sizeof(*eh), M_DONTWAIT);
+       if (m == NULL)
+               return (NULL);
+
+       /* checksum? */
+
+       eh = mtod(m, struct ether_header *);
+       *eh = copy;
+
+       return (m);
+}
+#endif /* NPF > 0 */
+
+static void
+veb_broadcast(struct veb_softc *sc, struct veb_port *rp, struct mbuf *m0)
+{
+       struct ifnet *ifp = &sc->sc_if;
+       struct veb_port *tp;
+       struct ifnet *ifp0;
+       struct mbuf *m;
+
+#if NPF > 0
+       /*
+        * we couldnt find a specific port to send this packet to,
+        * but pf should still have a chance to apply policy to it.
+        * let pf look at it, but use the veb interface as a proxy.
+        */
+       if (ISSET(ifp->if_flags, IFF_LINK1) &&
+           (m = veb_pf(ifp, PF_OUT, m0)) == NULL)
+               return;
+#endif
+
+       counters_pkt(ifp->if_counters, ifc_opackets, ifc_obytes,
+           m0->m_pkthdr.len);
+
+       smr_read_enter();
+       SMR_TAILQ_FOREACH(tp, &sc->sc_ports.l_list, p_entry) {
+               if (rp == tp || (rp->p_protected & tp->p_protected)) {
+                       /*
+                        * don't let Ethernet packets hairpin or
+                        * move between ports in the same protected
+                        * domain(s).
+                        */
+                       continue;
+               }
+
+               ifp0 = tp->p_ifp0;
+               if (!ISSET(ifp0->if_flags, IFF_RUNNING)) {
+                       /* don't waste time */
+                       continue;
+               }
+
+               if (!ISSET(tp->p_bif_flags, IFBIF_DISCOVER) &&
+                   !ISSET(m0->m_flags, M_BCAST | M_MCAST)) {
+                       /* don't flood unknown unicast */
+                       continue;
+               }
+
+               if (veb_rule_filter(tp, VEB_RULE_LIST_OUT, m0))
+                       continue;
+
+               m = m_dup_pkt(m0, max_linkhdr + ETHER_ALIGN, M_NOWAIT);
+               if (m == NULL) {
+                       /* XXX count error? */
+                       continue;
+               }
+
+               if_enqueue(ifp0, m); /* XXX count error? */
+       }
+       smr_read_leave();
+
+       m_freem(m0);
+}
+
+static struct mbuf *
+veb_transmit(struct veb_softc *sc, struct veb_port *rp, struct veb_port *tp,
+    struct mbuf *m)
+{
+       struct ifnet *ifp = &sc->sc_if;
+       struct ifnet *ifp0;
+
+       if (tp == NULL)
+               return (m);
+
+       if (rp == tp || (rp->p_protected & tp->p_protected)) {
+                /*
+                * don't let Ethernet packets hairpin or move between
+                * ports in the same protected domain(s).
+                */
+               goto drop;
+       }
+
+       if (veb_rule_filter(tp, VEB_RULE_LIST_OUT, m))
+               goto drop;
+
+       ifp0 = tp->p_ifp0;
+
+#if NPF > 0
+       if (ISSET(ifp->if_flags, IFF_LINK1) &&
+           (m = veb_pf(ifp0, PF_OUT, m)) == NULL)
+               return (NULL);
+#endif
+
+       counters_pkt(ifp->if_counters, ifc_opackets, ifc_obytes,
+           m->m_pkthdr.len);
+
+       if_enqueue(ifp0, m); /* XXX count error? */
+
+       return (NULL);
+drop:
+       m_freem(m);
+       return (NULL);
+}
+
+static struct mbuf *
+veb_port_input(struct ifnet *ifp0, struct mbuf *m, void *brport)
+{
+       struct veb_port *p = brport;
+       struct veb_softc *sc = p->p_veb;
+       struct ifnet *ifp = &sc->sc_if;
+       struct ether_header *eh;
+#if NBPFILTER > 0
+       caddr_t if_bpf;
+#endif
+
+       if (ISSET(m->m_flags, M_PROTO1)) {
+               CLR(m->m_flags, M_PROTO1);
+               return (m);
+       }
+
+       if (!ISSET(ifp->if_flags, IFF_RUNNING))
+               return (m);
+
+#if NVLAN > 0
+       /*
+        * If the underlying interface removed the VLAN header itself,
+        * add it back.
+        */
+       if (ISSET(m->m_flags, M_VLANTAG)) {
+               m = vlan_inject(m, ETHERTYPE_VLAN, m->m_pkthdr.ether_vtag);
+               if (m == NULL) {
+                       counters_inc(ifp->if_counters, ifc_ierrors);
+                       goto drop;
+               }
+       }
+#endif
+
+       counters_pkt(ifp->if_counters, ifc_ipackets, ifc_ibytes,
+           m->m_pkthdr.len);
+
+       /* force packets into the one routing domain for pf */
+       m->m_pkthdr.ph_rtableid = ifp->if_rdomain;
+
+#if NBPFILTER > 0
+       if_bpf = READ_ONCE(ifp->if_bpf);
+       if (if_bpf != NULL) {
+               if (bpf_mtap_ether(if_bpf, m, 0) != 0)
+                       goto drop;
+       }
+#endif
+
+       veb_span(sc, m);
+
+       if (!ISSET(ifp->if_flags, IFF_LINK2) &&
+           veb_vlan_filter(m))
+               goto drop;
+
+       if (veb_rule_filter(p, VEB_RULE_LIST_IN, m))
+               goto drop;
+
+#if NPF > 0
+       if (ISSET(ifp->if_flags, IFF_LINK1) &&
+           (m = veb_pf(ifp0, PF_IN, m)) == NULL)
+               return (NULL);
+#endif
+
+       eh = mtod(m, struct ether_header *);
+
+       if (ISSET(p->p_bif_flags, IFBIF_LEARNING)) {
+               etherbridge_map(&sc->sc_eb, p,
+                   (struct ether_addr *)eh->ether_shost);
+       }
+
+       CLR(m->m_flags, M_BCAST|M_MCAST);
+       SET(m->m_flags, M_PROTO1);
+
+       if (!ETHER_IS_MULTICAST(eh->ether_dhost)) {
+               struct veb_port *tp = NULL;
+
+               smr_read_enter();
+               tp = etherbridge_resolve(&sc->sc_eb,
+                   (struct ether_addr *)eh->ether_dhost);
+               m = veb_transmit(sc, p, tp, m);
+               smr_read_leave();
+
+               if (m == NULL)
+                       return (NULL);
+
+               /* unknown unicast address */
+       } else {
+               SET(m->m_flags,
+                   ETHER_IS_BROADCAST(eh->ether_dhost) ? M_BCAST : M_MCAST);
+       }
+
+       veb_broadcast(sc, p, m);
+       return (NULL);
+
+drop:
+       m_freem(m);
+       return (NULL);
+}
+
+static void
+veb_input(struct ifnet *ifp, struct mbuf *m)
+{
+       m_freem(m);
+}
+
+static int
+veb_output(struct ifnet *ifp, struct mbuf *m, struct sockaddr *dst,
+    struct rtentry *rt)
+{
+       m_freem(m);
+       return (ENODEV);
+}
+
+static int
+veb_enqueue(struct ifnet *ifp, struct mbuf *m)
+{
+       m_freem(m);
+       return (ENODEV);
+}
+
+static void
+veb_start(struct ifqueue *ifq)
+{
+       ifq_purge(ifq);
+}
+
+static int
+veb_ioctl(struct ifnet *ifp, u_long cmd, caddr_t data)
+{
+       struct veb_softc *sc = ifp->if_softc;
+       struct ifbrparam *bparam = (struct ifbrparam *)data;
+       int error = 0;
+
+       if (sc->sc_dead)
+               return (ENXIO);
+
+       switch (cmd) {
+       case SIOCSIFFLAGS:
+               if (ISSET(ifp->if_flags, IFF_UP)) {
+                       if (!ISSET(ifp->if_flags, IFF_RUNNING))
+                               error = veb_up(sc);
+               } else {
+                       if (ISSET(ifp->if_flags, IFF_RUNNING))
+                               error = veb_down(sc);
+               }
+               break;
+
+       case SIOCBRDGADD:
+               error = suser(curproc);
+               if (error != 0)
+                       break;
+
+               error = veb_add_port(sc, (struct ifbreq *)data, 0);
+               break;
+       case SIOCBRDGADDS:
+               error = suser(curproc);
+               if (error != 0)
+                       break;
+
+               error = veb_add_port(sc, (struct ifbreq *)data, 1);
+               break;
+       case SIOCBRDGDEL:
+               error = suser(curproc);
+               if (error != 0)
+                       break;
+
+               error = veb_del_port(sc, (struct ifbreq *)data, 0);
+               break;
+       case SIOCBRDGDELS:
+               error = suser(curproc);
+               if (error != 0)
+                       break;
+
+               error = veb_del_port(sc, (struct ifbreq *)data, 1);
+               break;
+
+       case SIOCBRDGSCACHE:
+               error = suser(curproc);
+               if (error != 0)
+                       break;
+
+               error = etherbridge_set_max(&sc->sc_eb, bparam);
+               break;
+       case SIOCBRDGGCACHE:
+               error = etherbridge_get_max(&sc->sc_eb, bparam);
+               break;
+
+       case SIOCBRDGSTO:
+               error = suser(curproc);
+               if (error != 0)
+                       break;
+
+               error = etherbridge_set_tmo(&sc->sc_eb, bparam);
+               break;
+       case SIOCBRDGGTO:
+               error = etherbridge_get_tmo(&sc->sc_eb, bparam);
+               break;
+
+       case SIOCBRDGRTS:
+               error = etherbridge_rtfind(&sc->sc_eb, (struct ifbaconf *)data);
+               break;
+       case SIOCBRDGIFS:
+               error = veb_port_list(sc, (struct ifbifconf *)data);
+               break;
+
+       case SIOCBRDGSIFPROT:
+               error = veb_port_set_protected(sc, (struct ifbreq *)data);
+               break;
+
+       case SIOCBRDGARL:
+               error = veb_rule_add(sc, (struct ifbrlreq *)data);
+               break;
+       case SIOCBRDGFRL:
+               error = veb_rule_list_flush(sc, (struct ifbrlreq *)data);
+               break;
+       case SIOCBRDGGRL:
+               error = veb_rule_list_get(sc, (struct ifbrlconf *)data);
+               break;
+
+       default:
+               error = ENOTTY;
+               break;
+       }
+
+       if (error == ENETRESET)
+               error = veb_iff(sc);
+
+       return (error);
+}
+
+static int
+veb_add_port(struct veb_softc *sc, const struct ifbreq *req, unsigned int span)
+{
+       struct ifnet *ifp = &sc->sc_if;
+       struct ifnet *ifp0;
+       struct veb_ports *port_list;
+       struct veb_port *p;
+       int error;
+
+       NET_ASSERT_LOCKED();
+
+       ifp0 = if_unit(req->ifbr_ifsname);
+       if (ifp0 == NULL)
+               return (EINVAL);
+
+       if (ifp0->if_type != IFT_ETHER) {
+               error = EPROTONOSUPPORT;
+               goto put;
+       }
+
+       if (ifp0 == ifp) {
+               error = EPROTONOSUPPORT;
+               goto put;
+       }
+
+       error = ether_brport_isset(ifp0);
+       if (error != 0)
+               goto put;
+
+       /* let's try */
+
+       p = malloc(sizeof(*p), M_DEVBUF, M_WAITOK|M_ZERO|M_CANFAIL);
+       if (p == NULL) {
+               error = ENOMEM;
+               goto put;
+       }
+
+       p->p_ifp0 = ifp0;
+       p->p_veb = sc;
+
+       refcnt_init(&p->p_refs);
+       TAILQ_INIT(&p->p_vrl);
+       SMR_TAILQ_INIT(&p->p_vr_list[0]);
+       SMR_TAILQ_INIT(&p->p_vr_list[1]);
+
+       p->p_ioctl = ifp0->if_ioctl;
+       p->p_output = ifp0->if_output;
+
+       if (span) {
+               port_list = &sc->sc_spans;
+
+               p->p_brport.eb_input = veb_span_input;
+               p->p_bif_flags = IFBIF_SPAN;
+       } else {
+               port_list = &sc->sc_ports;
+
+               error = ifpromisc(ifp0, 1);
+               if (error != 0)
+                       goto free;
+
+               p->p_bif_flags = IFBIF_LEARNING | IFBIF_DISCOVER;
+               p->p_brport.eb_input = veb_port_input;
+       }
+
+       /* this might have changed if we slept for malloc or ifpromisc */
+       error = ether_brport_isset(ifp0);
+       if (error != 0)
+               goto unpromisc;
+
+       task_set(&p->p_ltask, veb_p_linkch, p);
+       if_linkstatehook_add(ifp0, &p->p_ltask);
+
+       task_set(&p->p_dtask, veb_p_detach, p);
+       if_detachhook_add(ifp0, &p->p_dtask);
+
+       p->p_brport.eb_port = p;
+
+       /* commit */
+       SMR_TAILQ_INSERT_TAIL_LOCKED(&port_list->l_list, p, p_entry);
+       port_list->l_count++;
+
+       ether_brport_set(ifp0, &p->p_brport);
+       if (ifp0->if_enqueue != vport_enqueue) { /* vport is special */
+               ifp0->if_ioctl = veb_p_ioctl;
+               ifp0->if_output = veb_p_output;
+       }
+
+       veb_p_linkch(p);
+
+       return (0);
+
+unpromisc:
+       if (!span)
+               ifpromisc(ifp0, 0);
+free:
+       free(p, M_DEVBUF, sizeof(*p));
+put:
+       if_put(ifp0);
+       return (error);
+}
+
+static struct veb_port *
+veb_trunkport(struct veb_softc *sc, const char *name, unsigned int span)
+{
+       struct veb_ports *port_list;
+       struct veb_port *p;
+
+       port_list = span ? &sc->sc_spans : &sc->sc_ports;
+
+       SMR_TAILQ_FOREACH_LOCKED(p, &port_list->l_list, p_entry) {
+               if (strcmp(p->p_ifp0->if_xname, name) == 0)
+                       return (p);
+       }
+
+       return (NULL);
+}
+
+static int
+veb_del_port(struct veb_softc *sc, const struct ifbreq *req, unsigned int span)
+{
+       struct veb_port *p;
+
+       NET_ASSERT_LOCKED();
+       p = veb_trunkport(sc, req->ifbr_ifsname, span);
+       if (p == NULL)
+               return (EINVAL);
+
+       veb_p_dtor(sc, p, "del");
+
+       return (0);
+}
+
+static struct veb_port *
+veb_port_get(struct veb_softc *sc, const char *name)
+{
+       struct veb_port *p;
+
+       NET_ASSERT_LOCKED();
+
+       SMR_TAILQ_FOREACH_LOCKED(p, &sc->sc_ports.l_list, p_entry) {
+               struct ifnet *ifp0 = p->p_ifp0;
+               if (strncmp(ifp0->if_xname, name,
+                   sizeof(ifp0->if_xname)) == 0) {
+                       refcnt_take(&p->p_refs);
+                       break;
+               }
+       }
+
+       return (p);
+}
+
+static void
+veb_port_put(struct veb_softc *sc, struct veb_port *p)
+{
+       refcnt_rele_wake(&p->p_refs);
+}
+
+static int
+veb_port_set_protected(struct veb_softc *sc, const struct ifbreq *ifbr)
+{
+       struct veb_port *p;
+
+       p = veb_port_get(sc, ifbr->ifbr_ifsname);
+       if (p == NULL)
+               return (ESRCH);
+
+       p->p_protected = ifbr->ifbr_protected;
+       veb_port_put(sc, p);
+
+       return (0);
+}
+
+static int
+veb_rule_add(struct veb_softc *sc, const struct ifbrlreq *ifbr)
+{
+       const struct ifbrarpf *brla = &ifbr->ifbr_arpf;
+       struct veb_rule vr, *vrp;
+       struct veb_port *p;
+       int error;
+
+       memset(&vr, 0, sizeof(vr));
+
+       switch (ifbr->ifbr_action) {
+       case BRL_ACTION_BLOCK:
+               vr.vr_action = VEB_R_BLOCK;
+               break;
+       case BRL_ACTION_PASS:
+               vr.vr_action = VEB_R_PASS;
+               break;
+       /* XXX VEB_R_MATCH */
+       default:
+               return (EINVAL);
+       }
+
+       if (!ISSET(ifbr->ifbr_flags, BRL_FLAG_IN|BRL_FLAG_OUT))
+               return (EINVAL);
+       if (ISSET(ifbr->ifbr_flags, BRL_FLAG_IN))
+               SET(vr.vr_flags, VEB_R_F_IN);
+       if (ISSET(ifbr->ifbr_flags, BRL_FLAG_OUT))
+               SET(vr.vr_flags, VEB_R_F_OUT);
+
+       if (ISSET(ifbr->ifbr_flags, BRL_FLAG_SRCVALID)) {
+               SET(vr.vr_flags, VEB_R_F_SRC);
+               vr.vr_src = ifbr->ifbr_src;
+       }
+       if (ISSET(ifbr->ifbr_flags, BRL_FLAG_DSTVALID)) {
+               SET(vr.vr_flags, VEB_R_F_DST);
+               vr.vr_dst = ifbr->ifbr_dst;
+       }
+
+       /* ARP rule */
+       if (ISSET(brla->brla_flags, BRLA_ARP|BRLA_RARP)) {
+               if (ISSET(brla->brla_flags, BRLA_ARP))
+                       SET(vr.vr_flags, VEB_R_F_ARP);
+               if (ISSET(brla->brla_flags, BRLA_RARP))
+                       SET(vr.vr_flags, VEB_R_F_RARP);
+
+               if (ISSET(brla->brla_flags, BRLA_SHA)) {
+                       SET(vr.vr_flags, VEB_R_F_SHA);
+                       vr.vr_arp_sha = brla->brla_sha;
+               }
+               if (ISSET(brla->brla_flags, BRLA_THA)) {
+                       SET(vr.vr_flags, VEB_R_F_THA);
+                       vr.vr_arp_tha = brla->brla_tha;
+               }
+               if (ISSET(brla->brla_flags, BRLA_SPA)) {
+                       SET(vr.vr_flags, VEB_R_F_SPA);
+                       vr.vr_arp_spa = brla->brla_spa;
+               }
+               if (ISSET(brla->brla_flags, BRLA_TPA)) {
+                       SET(vr.vr_flags, VEB_R_F_TPA);
+                       vr.vr_arp_tpa = brla->brla_tpa;
+               }
+               vr.vr_arp_op = htons(brla->brla_op);
+       }
+
+       if (ifbr->ifbr_tagname[0] != '\0') {
+#if NPF > 0
+               vr.vr_pftag = pf_tagname2tag((char *)ifbr->ifbr_tagname, 1);
+               if (vr.vr_pftag == 0)
+                       return (ENOMEM);
+#else
+               return (EINVAL);
+#endif
+       }
+
+       p = veb_port_get(sc, ifbr->ifbr_ifsname);
+       if (p == NULL) {
+               error = ESRCH;
+               goto error;
+       }
+
+       vrp = pool_get(&veb_rule_pool, PR_WAITOK|PR_LIMITFAIL|PR_ZERO);
+       if (vrp == NULL) {
+               error = ENOMEM;
+               goto port_put;
+       }
+
+       *vrp = vr;
+
+       /* there's one big lock on a veb for all ports */
+       error = rw_enter(&sc->sc_rule_lock, RW_WRITE|RW_INTR);
+       if (error != 0)
+               goto rule_put;
+
+       TAILQ_INSERT_TAIL(&p->p_vrl, vrp, vr_entry);
+       p->p_nvrl++;
+       if (ISSET(vr.vr_flags, VEB_R_F_OUT)) {
+               SMR_TAILQ_INSERT_TAIL_LOCKED(&p->p_vr_list[0],
+                   vrp, vr_lentry[0]);
+       }
+       if (ISSET(vr.vr_flags, VEB_R_F_IN)) {
+               SMR_TAILQ_INSERT_TAIL_LOCKED(&p->p_vr_list[1],
+                   vrp, vr_lentry[1]);
+       }
+
+       rw_exit(&sc->sc_rule_lock);
+       veb_port_put(sc, p);
+
+       return (0);
+
+rule_put:
+       pool_put(&veb_rule_pool, vrp);
+port_put:
+       veb_port_put(sc, p);
+error:
+#if NPF > 0
+       pf_tag_unref(vr.vr_pftag);
+#endif
+       return (error);
+}
+
+static void
+veb_rule_list_free(struct veb_rule *nvr)
+{
+       struct veb_rule *vr;
+
+       while ((vr = nvr) != NULL) {
+               nvr = TAILQ_NEXT(vr, vr_entry);
+               pool_put(&veb_rule_pool, vr);
+       }
+}
+
+static int
+veb_rule_list_flush(struct veb_softc *sc, const struct ifbrlreq *ifbr)
+{
+       struct veb_port *p;
+       struct veb_rule *vr;
+       int error;
+
+       p = veb_port_get(sc, ifbr->ifbr_ifsname);
+       if (p == NULL)
+               return (ESRCH);
+
+       error = rw_enter(&sc->sc_rule_lock, RW_WRITE|RW_INTR);
+       if (error != 0) {
+               veb_port_put(sc, p);
+               return (error);
+       }
+
+       /* take all the rules away */
+       vr = TAILQ_FIRST(&p->p_vrl);
+
+       /* reset the lists and counts of rules */
+       TAILQ_INIT(&p->p_vrl);
+       p->p_nvrl = 0;
+       SMR_TAILQ_INIT(&p->p_vr_list[0]);
+       SMR_TAILQ_INIT(&p->p_vr_list[1]);
+
+       rw_exit(&sc->sc_rule_lock);
+       veb_port_put(sc, p);
+
+       smr_barrier();
+       veb_rule_list_free(vr);
+
+       return (0);
+}
+
+static void
+veb_rule2ifbr(struct ifbrlreq *ifbr, const struct veb_rule *vr)
+{
+       switch (vr->vr_action) {
+       case VEB_R_PASS:
+               ifbr->ifbr_action = BRL_ACTION_PASS;
+               break;
+       case VEB_R_BLOCK:
+               ifbr->ifbr_action = BRL_ACTION_BLOCK;
+               break;
+       }
+
+       if (ISSET(vr->vr_flags, VEB_R_F_IN))
+               SET(ifbr->ifbr_flags, BRL_FLAG_IN);
+       if (ISSET(vr->vr_flags, VEB_R_F_OUT))
+               SET(ifbr->ifbr_flags, BRL_FLAG_OUT);
+
+       if (ISSET(vr->vr_flags, VEB_R_F_SRC)) {
+               SET(ifbr->ifbr_flags, BRL_FLAG_SRCVALID);
+               ifbr->ifbr_src = vr->vr_src;
+       }
+       if (ISSET(vr->vr_flags, VEB_R_F_DST)) {
+               SET(ifbr->ifbr_flags, BRL_FLAG_DSTVALID);
+               ifbr->ifbr_dst = vr->vr_dst;
+       }
+
+       /* ARP rule */
+       if (ISSET(vr->vr_flags, VEB_R_F_ARP|VEB_R_F_RARP)) {
+               struct ifbrarpf *brla = &ifbr->ifbr_arpf;
+
+               if (ISSET(vr->vr_flags, VEB_R_F_ARP))
+                       SET(brla->brla_flags, BRLA_ARP);
+               if (ISSET(vr->vr_flags, VEB_R_F_RARP))
+                       SET(brla->brla_flags, BRLA_RARP);
+
+               if (ISSET(vr->vr_flags, VEB_R_F_SHA)) {
+                       SET(brla->brla_flags, BRLA_SHA);
+                       brla->brla_sha = vr->vr_arp_sha;
+               }
+               if (ISSET(vr->vr_flags, VEB_R_F_THA)) {
+                       SET(brla->brla_flags, BRLA_THA);
+                       brla->brla_tha = vr->vr_arp_tha;
+               }
+
+               if (ISSET(vr->vr_flags, VEB_R_F_SPA)) {
+                       SET(brla->brla_flags, BRLA_SPA);
+                       brla->brla_spa = vr->vr_arp_spa;
+               }
+               if (ISSET(vr->vr_flags, VEB_R_F_TPA)) {
+                       SET(brla->brla_flags, BRLA_TPA);
+                       brla->brla_tpa = vr->vr_arp_tpa;
+               }
+
+               brla->brla_op = ntohs(vr->vr_arp_op);
+       }
+
+#if NPF > 0
+       if (vr->vr_pftag != 0)
+               pf_tag2tagname(vr->vr_pftag, ifbr->ifbr_tagname);
+#endif
+}
+
+static int
+veb_rule_list_get(struct veb_softc *sc, struct ifbrlconf *ifbrl)
+{
+       struct veb_port *p;
+       struct veb_rule *vr;
+       struct ifbrlreq *ifbr, *ifbrs;
+       int error = 0;
+       size_t len;
+
+       p = veb_port_get(sc, ifbrl->ifbrl_ifsname);
+       if (p == NULL)
+               return (ESRCH);
+
+       len = p->p_nvrl; /* estimate */
+       if (ifbrl->ifbrl_len == 0 || len == 0) {
+               ifbrl->ifbrl_len = len * sizeof(*ifbrs);
+               goto port_put;
+       }
+
+       error = rw_enter(&sc->sc_rule_lock, RW_READ|RW_INTR);
+       if (error != 0)
+               goto port_put;
+
+       ifbrs = mallocarray(p->p_nvrl, sizeof(*ifbrs), M_TEMP,
+           M_WAITOK|M_CANFAIL|M_ZERO);
+       if (ifbrs == NULL) {
+               rw_exit(&sc->sc_rule_lock);
+               goto port_put;
+       }
+       len = p->p_nvrl * sizeof(*ifbrs);
+
+       ifbr = ifbrs;
+       TAILQ_FOREACH(vr, &p->p_vrl, vr_entry) {
+               strlcpy(ifbr->ifbr_name, sc->sc_if.if_xname,
+                   sizeof(ifbr->ifbr_name));
+               strlcpy(ifbr->ifbr_ifsname, p->p_ifp0->if_xname,
+                   sizeof(ifbr->ifbr_ifsname));
+               veb_rule2ifbr(ifbr, vr);
+
+               ifbr++;
+       }
+
+       rw_exit(&sc->sc_rule_lock);
+
+       error = copyout(ifbrs, ifbrl->ifbrl_buf, min(len, ifbrl->ifbrl_len));
+       if (error == 0)
+               ifbrl->ifbrl_len = len;
+       free(ifbrs, M_TEMP, len);
+
+port_put:
+       veb_port_put(sc, p);
+       return (error);
+}
+
+static int
+veb_port_list(struct veb_softc *sc, struct ifbifconf *bifc)
+{
+       struct ifnet *ifp = &sc->sc_if;
+       struct veb_port *p;
+       struct ifnet *ifp0;
+       struct ifbreq breq;
+       int n = 0, error = 0;
+
+       NET_ASSERT_LOCKED();
+
+       if (bifc->ifbic_len == 0) {
+               n = sc->sc_ports.l_count + sc->sc_spans.l_count;
+               goto done;
+       }
+
+       SMR_TAILQ_FOREACH_LOCKED(p, &sc->sc_ports.l_list, p_entry) {
+               if (bifc->ifbic_len < sizeof(breq))
+                       break;
+
+               memset(&breq, 0, sizeof(breq));
+
+               ifp0 = p->p_ifp0;
+
+               strlcpy(breq.ifbr_name, ifp->if_xname, IFNAMSIZ);
+               strlcpy(breq.ifbr_ifsname, ifp0->if_xname, IFNAMSIZ);
+
+               /* flag as span port so ifconfig(8)'s brconfig.c:bridge_list()
+                * stays quiet wrt. STP */
+               breq.ifbr_ifsflags = p->p_bif_flags;
+               breq.ifbr_portno = ifp0->if_index;
+               breq.ifbr_protected = p->p_protected;
+               if ((error = copyout(&breq, bifc->ifbic_req + n,
+                   sizeof(breq))) != 0)
+                       goto done;
+
+               bifc->ifbic_len -= sizeof(breq);
+               n++;
+       }
+
+       SMR_TAILQ_FOREACH_LOCKED(p, &sc->sc_spans.l_list, p_entry) {
+               if (bifc->ifbic_len < sizeof(breq))
+                       break;
+
+               memset(&breq, 0, sizeof(breq));
+
+               strlcpy(breq.ifbr_name, ifp->if_xname, IFNAMSIZ);
+               strlcpy(breq.ifbr_ifsname, p->p_ifp0->if_xname, IFNAMSIZ);
+
+               /* flag as span port so ifconfig(8)'s brconfig.c:bridge_list()
+                * stays quiet wrt. STP */
+               breq.ifbr_ifsflags = p->p_bif_flags;
+               if ((error = copyout(&breq, bifc->ifbic_req + n,
+                   sizeof(breq))) != 0)
+                       goto done;
+
+               bifc->ifbic_len -= sizeof(breq);
+               n++;
+       }
+
+done:
+       bifc->ifbic_len = n * sizeof(breq);
+       return (error);
+}
+
+static int
+veb_p_ioctl(struct ifnet *ifp0, u_long cmd, caddr_t data)
+{
+       const struct ether_brport *eb = ether_brport_get_locked(ifp0);
+       struct veb_port *p;
+       int error = 0;
+
+       KASSERTMSG(eb != NULL,
+           "%s: %s called without an ether_brport set",
+           ifp0->if_xname, __func__);
+       KASSERTMSG(eb->eb_input == veb_port_input,
+           "%s: %s called, but eb_input seems wrong (%p != veb_port_input())",
+           ifp0->if_xname, __func__, eb->eb_input);
+
+       p = eb->eb_port;
+
+       switch (cmd) {
+       case SIOCSIFADDR:
+               error = EBUSY;
+               break;
+
+       default:
+               error = (*p->p_ioctl)(ifp0, cmd, data);
+               break;
+       }
+
+       return (error);
+}
+
+static int
+veb_p_output(struct ifnet *ifp0, struct mbuf *m, struct sockaddr *dst,
+    struct rtentry *rt)
+{
+       int (*p_output)(struct ifnet *, struct mbuf *, struct sockaddr *,
+           struct rtentry *) = NULL;
+       const struct ether_brport *eb;
+
+       /* restrict transmission to bpf only */
+       if ((m_tag_find(m, PACKET_TAG_DLT, NULL) == NULL)) {
+               m_freem(m);
+               return (EBUSY);
+       }
+
+       smr_read_enter();
+       eb = ether_brport_get(ifp0);
+       if (eb != NULL && eb->eb_input == veb_port_input) {
+               struct veb_port *p = eb->eb_port;
+               p_output = p->p_output; /* code doesn't go away */
+       }
+       smr_read_leave();
+
+       if (p_output == NULL) {
+               m_freem(m);
+               return (ENXIO);
+       }
+
+       return ((*p_output)(ifp0, m, dst, rt));
+}
+
+static void
+veb_p_dtor(struct veb_softc *sc, struct veb_port *p, const char *op)
+{
+       struct ifnet *ifp = &sc->sc_if;
+       struct ifnet *ifp0 = p->p_ifp0;
+       struct veb_ports *port_list;
+
+       DPRINTF(sc, "%s %s: destroying port\n",
+           ifp->if_xname, ifp0->if_xname);
+
+       ifp0->if_ioctl = p->p_ioctl;
+       ifp0->if_output = p->p_output;
+
+       ether_brport_clr(ifp0);
+
+       if_detachhook_del(ifp0, &p->p_dtask);
+       if_linkstatehook_del(ifp0, &p->p_ltask);
+
+       if (p->p_span) {
+               port_list = &sc->sc_spans;
+       } else {
+               if (ifpromisc(ifp0, 0) != 0) {
+                       log(LOG_WARNING, "%s %s: unable to disable promisc\n",
+                           ifp->if_xname, ifp0->if_xname);
+               }
+
+               etherbridge_detach_port(&sc->sc_eb, p);
+
+               port_list = &sc->sc_ports;
+       }
+       SMR_TAILQ_REMOVE_LOCKED(&port_list->l_list, p, p_entry);
+       port_list->l_count--;
+
+       smr_barrier();
+       refcnt_finalize(&p->p_refs, "vebpdtor");
+
+       veb_rule_list_free(TAILQ_FIRST(&p->p_vrl));
+
+       if_put(ifp0);
+       free(p, M_DEVBUF, sizeof(*p));
+}
+
+static void
+veb_p_detach(void *arg)
+{
+       struct veb_port *p = arg;
+       struct veb_softc *sc = p->p_veb;
+
+       veb_p_dtor(sc, p, "detach");
+
+       NET_ASSERT_LOCKED();
+}
+
+static int
+veb_p_active(struct veb_port *p)
+{
+       struct ifnet *ifp0 = p->p_ifp0;
+
+       return (ISSET(ifp0->if_flags, IFF_RUNNING) &&
+           LINK_STATE_IS_UP(ifp0->if_link_state));
+}
+
+static void
+veb_p_linkch(void *arg)
+{
+       struct veb_port *p = arg;
+       u_char link_state = LINK_STATE_FULL_DUPLEX;
+
+       NET_ASSERT_LOCKED();
+
+       if (!veb_p_active(p))
+               link_state = LINK_STATE_DOWN;
+
+       p->p_link_state = link_state;
+}
+
+static int
+veb_up(struct veb_softc *sc)
+{
+       struct ifnet *ifp = &sc->sc_if;
+       int error;
+
+       error = etherbridge_up(&sc->sc_eb);
+       if (error != 0)
+               return (error);
+
+       NET_ASSERT_LOCKED();
+       SET(ifp->if_flags, IFF_RUNNING);
+
+       return (0);
+}
+
+static int
+veb_iff(struct veb_softc *sc)
+{
+       return (0);
+}
+
+static int
+veb_down(struct veb_softc *sc)
+{
+       struct ifnet *ifp = &sc->sc_if;
+       int error;
+
+       error = etherbridge_down(&sc->sc_eb);
+       if (error != 0)
+               return (0);
+
+       NET_ASSERT_LOCKED();
+       CLR(ifp->if_flags, IFF_RUNNING);
+
+       return (0);
+}
+
+static int
+veb_eb_port_cmp(void *arg, void *a, void *b)
+{
+       struct veb_port *pa = a, *pb = b;
+       return (pa == pb);
+}
+
+static void *
+veb_eb_port_take(void *arg, void *port)
+{
+       struct veb_port *p = port;
+
+       refcnt_take(&p->p_refs);
+
+       return (p);
+}
+
+static void
+veb_eb_port_rele(void *arg, void *port)
+{
+       struct veb_port *p = port;
+
+       refcnt_rele_wake(&p->p_refs);
+}
+
+static size_t
+veb_eb_port_ifname(void *arg, char *dst, size_t len, void *port)
+{
+       struct veb_port *p = port;
+
+       return (strlcpy(dst, p->p_ifp0->if_xname, len));
+}
+
+static void
+veb_eb_port_sa(void *arg, struct sockaddr_storage *ss, void *port)
+{
+       ss->ss_family = AF_UNSPEC;
+}
+
+/*
+ * virtual ethernet bridge port
+ */
+
+static int
+vport_clone_create(struct if_clone *ifc, int unit)
+{
+       struct vport_softc *sc;
+       struct ifnet *ifp;
+
+       sc = malloc(sizeof(*sc), M_DEVBUF, M_WAITOK|M_ZERO|M_CANFAIL);
+       if (sc == NULL)
+               return (ENOMEM);
+
+       ifp = &sc->sc_ac.ac_if;
+
+       snprintf(ifp->if_xname, sizeof(ifp->if_xname), "%s%d",
+           ifc->ifc_name, unit);
+
+       ifp->if_softc = sc;
+       ifp->if_type = IFT_ETHER;
+       ifp->if_hardmtu = ETHER_MAX_HARDMTU_LEN;
+       ifp->if_ioctl = vport_ioctl;
+       ifp->if_enqueue = vport_enqueue;
+       ifp->if_qstart = vport_start;
+       ifp->if_flags = IFF_BROADCAST | IFF_SIMPLEX | IFF_MULTICAST;
+       ifp->if_xflags = IFXF_CLONED | IFXF_MPSAFE;
+       ether_fakeaddr(ifp);
+
+       if_counters_alloc(ifp);
+       if_attach(ifp);
+       ether_ifattach(ifp);
+
+       return (0);
+}
+
+static int
+vport_clone_destroy(struct ifnet *ifp)
+{
+       struct vport_softc *sc = ifp->if_softc;
+
+       NET_LOCK();
+       sc->sc_dead = 1;
+
+       if (ISSET(ifp->if_flags, IFF_RUNNING))
+               vport_down(sc);
+       NET_UNLOCK();
+
+       ether_ifdetach(ifp);
+       if_detach(ifp);
+
+       free(sc, M_DEVBUF, sizeof(*sc));
+
+       return (0);
+}
+
+static int
+vport_ioctl(struct ifnet *ifp, u_long cmd, caddr_t data)
+{
+       struct vport_softc *sc = ifp->if_softc;
+       int error = 0;
+
+       if (sc->sc_dead)
+               return (ENXIO);
+
+       switch (cmd) {
+       case SIOCSIFFLAGS:
+               if (ISSET(ifp->if_flags, IFF_UP)) {
+                       if (!ISSET(ifp->if_flags, IFF_RUNNING))
+                               error = vport_up(sc);
+               } else {
+                       if (ISSET(ifp->if_flags, IFF_RUNNING))
+                               error = vport_down(sc);
+               }
+               break;
+
+       case SIOCADDMULTI:
+       case SIOCDELMULTI:
+               break;
+
+       default:
+               error = ether_ioctl(ifp, &sc->sc_ac, cmd, data);
+               break;
+       }
+
+       if (error == ENETRESET)
+               error = vport_iff(sc);
+
+       return (error);
+}
+
+static int
+vport_up(struct vport_softc *sc)
+{
+       struct ifnet *ifp = &sc->sc_ac.ac_if;
+
+       NET_ASSERT_LOCKED();
+       SET(ifp->if_flags, IFF_RUNNING);
+
+       return (0);
+}
+
+static int
+vport_iff(struct vport_softc *sc)
+{
+       return (0);
+}
+
+static int
+vport_down(struct vport_softc *sc)
+{
+       struct ifnet *ifp = &sc->sc_ac.ac_if;
+
+       NET_ASSERT_LOCKED();
+       CLR(ifp->if_flags, IFF_RUNNING);
+
+       return (0);
+}
+
+static int
+vport_enqueue(struct ifnet *ifp, struct mbuf *m)
+{
+       struct arpcom *ac;
+       const struct ether_brport *eb;
+       int error = ENETDOWN;
+#if NBPFILTER > 0
+       caddr_t if_bpf;
+#endif
+
+#if NPF > 0
+       /*
+        * the packet is about to leave the l3 stack and go into
+        * the l2 switching space, or it's coming from a switch space
+        * into the network stack. either way, there's no relationship
+        * between pf states in those different places.
+        */
+       pf_pkt_addr_changed(m);
+#endif
+
+       if (ISSET(m->m_flags, M_PROTO1)) {
+               /* packet is coming from a bridge */
+               if_vinput(ifp, m);
+               return (0);
+       }
+
+       /* packet is going to the bridge */
+
+       ac = (struct arpcom *)ifp;
+
+       smr_read_enter();
+       eb = SMR_PTR_GET(&ac->ac_brport);
+       if (eb != NULL) {
+               counters_pkt(ifp->if_counters, ifc_opackets, ifc_obytes,
+                   m->m_pkthdr.len);
+
+#if NBPFILTER > 0
+               if_bpf = READ_ONCE(ifp->if_bpf);
+               if (if_bpf != NULL)
+                       bpf_mtap_ether(if_bpf, m, BPF_DIRECTION_OUT);
+#endif
+
+               m = (*eb->eb_input)(ifp, m, eb->eb_port);
+
+               error = 0;
+       }
+       smr_read_leave();
+
+       m_freem(m);
+
+       return (error);
+}
+
+static void
+vport_start(struct ifqueue *ifq)
+{
+       ifq_purge(ifq);
+}
Index: net/toeplitz.c
===================================================================
RCS file: /cvs/src/sys/net/toeplitz.c,v
retrieving revision 1.9
diff -u -p -r1.9 toeplitz.c
--- net/toeplitz.c      1 Sep 2020 19:18:26 -0000       1.9
+++ net/toeplitz.c      10 Feb 2021 12:06:23 -0000
@@ -187,6 +187,15 @@ stoeplitz_hash_ip6port(const struct stoe
 }
 #endif /* INET6 */
 
+uint16_t
+stoeplitz_hash_eaddr(const struct stoeplitz_cache *scache,
+    const uint8_t ea[static 6])
+{
+       const uint16_t *ea16 = (const uint16_t *)ea;
+
+       return (stoeplitz_hash_n16(scache, ea16[0] ^ ea16[1] ^ ea16[2]));
+}
+
 void
 stoeplitz_to_key(void *key, size_t klen)
 {
Index: net/toeplitz.h
===================================================================
RCS file: /cvs/src/sys/net/toeplitz.h,v
retrieving revision 1.3
diff -u -p -r1.3 toeplitz.h
--- net/toeplitz.h      19 Jun 2020 08:48:15 -0000      1.3
+++ net/toeplitz.h      10 Feb 2021 12:06:23 -0000
@@ -53,6 +53,9 @@ uint16_t      stoeplitz_hash_ip6port(const st
                    uint16_t, uint16_t);
 #endif
 
+uint16_t       stoeplitz_hash_eaddr(const struct stoeplitz_cache *,
+                   const uint8_t [static 6]);
+
 /* hash a uint16_t in network byte order */
 static __unused inline uint16_t
 stoeplitz_hash_n16(const struct stoeplitz_cache *scache, uint16_t n16)
@@ -116,5 +119,7 @@ extern const struct stoeplitz_cache *con
 #define stoeplitz_ip6port(_sa6, _da6, _sp, _dp) \
        stoeplitz_hash_ip6port(stoeplitz_cache, (_sa6), (_da6), (_sp), (_dp))
 #endif
+#define stoeplitz_eaddr(_ea) \
+       stoeplitz_hash_eaddr(stoeplitz_cache, (_ea))
 
 #endif /* _SYS_NET_TOEPLITZ_H_ */

Reply via email to