Saturday, November 3, 2018 8:19 AM, Slava Ovsiienko: > Subject: [PATCH v5 00/13] net/mlx5: e-switch VXLAN encap/decap hardware > offload > > This patchset adds the VXLAN encapsulation/decapsulation hardware offload > feature for E-Switch. > > A typical use case of tunneling infrastructure is port representors in > switchdev mode, with VXLAN traffic encapsulation performed on traffic > coming *from* a representor and decapsulation on traffic going *to* that > representor, in order to transparently assign a given VXLAN to VF traffic. > > Since these actions are supported at the E-Switch level, the "transfer" > attribute must be set on such flow rules. They must also be combined with a > port redirection action to make sense. > > Since only ingress is supported, encapsulation flow rules are normally applied > on a physical port and emit traffic to a port representor. > The opposite order is used for decapsulation. > > Like other mlx5 E-Switch flow rule actions, these ones are implemented > through Linux's TC flower API. Since the Linux interface for VXLAN > encap/decap involves virtual network devices (i.e. ip link add type > vxlan [...]), the PMD dynamically spawns them on a needed > basis through Netlink calls. These VXLAN implicitly created devices are called > VTEPs (Virtual Tunnel End Points). > > VXLAN interfaces are dynamically created for each local port of outer > networks and then used as targets for TC "flower" filters in order to perform > encapsulation. For decapsulation the VXLAN devices are created for each > unique UDP-port. These VXLAN interfaces are system-wide, the only one > device with given UDP port can exist in the system (the attempt of creating > another device with the same UDP local port returns EEXIST), so PMD should > support the shared (between PMD instances) device database. > > Rules samples consideraions: > > $PF - physical device, outer network > $VF - representor for VF, outer/inner network > $VXLAN - VTEP netdev name > $PF_OUTER_IP - $PF IP (v4 or v6) within outer network > $REMOTE_IP - remote peer IP (v4 or v6) within outer network > $LOCAL_PORT - local UDP port > $REMOTE_PORT - remote UDP port > > VXLAN VTEP creation with iproute2 (PMD does the same via Netlink): > > - for encapsulation: > > ip link add $VXLAN type vxlan dstport $LOCAL_PORT external dev $PF > ip link set dev $VXLAN up > tc qdisc del dev $VXLAN ingress > tc qdisc add dev $VXLAN ingress > > $LOCAL_PORT for egress encapsulated traffic (note, this is not source UDP > port in the VXLAN header, it is just UDP port assigned > to VTEP, no practical usage) is selected from available UDP ports > automatically in range 30000-60000. > > - for decapsulation: > > ip link add $VXLAN type vxlan dstport $LOCAL_PORT external > ip link set dev $VXLAN up > tc qdisc del dev $VXLAN ingress > tc qdisc add dev $VXLAN ingress > > $LOCAL_PORT is UDP port receiving the VXLAN traffic from outer networks. > > All ingress UDP traffic with given UDP destination port from ALL existing > netdevs is routed by kernel to the $VXLAN net device. While applying the > rule the kernel checks the IP parameter withing rule, determines the > appropriate underlaying PF and tryes to setup the rule hardware offload. > > VXLAN encapsulation > > VXLAN encap rules are applied to the VF ingress traffic and have the VTEP as > actual redirection destinations instead of outer PF. > The encapsulation rule should provide: > - redirection action VF->PF > - VF port ID > - some inner network parameters (MACs) > - the tunnel outer source IP (v4/v6), (IS A MUST) > - the tunnel outer destination IP (v4/v6), (IS A MUST). > - VNI - Virtual Network Identifier (IS A MUST) > > VXLAN encapsulation rule sample for tc utility: > > tc filter add dev $VF protocol all parent ffff: flower skip_sw \ > action tunnel_key set dst_port $REMOTE_PORT \ > src_ip $PF_OUTER_IP dst_ip $REMOTE_IP id $VNI \ > action mirred egress redirect dev $VXLAN > > VXLAN encapsulation rule sample for testpmd: > > - Setting up outer properties of VXLAN tunnel: > > set vxlan ip-version ipv4 vni $VNI \ > udp-src $IGNORED udp-dst $REMOTE_PORT \ > ip-src $PF_OUTER_IP ip-dst $REMOTE_IP \ > eth-src $IGNORED eth-dst $REMOTE_MAC > > - Creating a flow rule on port ID 4 performing VXLAN encapsulation > with the abovementioned properties and directing the resulting > traffic to port ID 0: > > flow create 4 ingress transfer pattern eth src is $INNER_MAC / end > actions vxlan_encap / port_id id 0 / end > > There is no direct way found to provide kernel with all required > encapsulatioh header parameters. The encapsulation VTEP is created > attached to the outer interface and assumed as default path for egress > encapsulated traffic. The outer tunnel IP address are assigned to interface > using Netlink, the implicit route is created like this: > > ip addr add <src_ip> peer <dst_ip> dev <outer> scope link > > The peer address option provides implicit route, and scope link attribute > reduces the risk of conflicts. At initialization time all local scope link > addresses > are flushed from the outer network device. > > The destination MAC address is provided via permenent neigh rule: > > ip neigh add dev <outer> lladdr <dst_mac> to <dst_ip> nud permanent > > At initialization time all neigh rules of permanent type are flushed from the > outer network device. > > VXLAN decapsulation > > VXLAN decap rules are applied to the ingress traffic of VTEP ($VXLAN) device > instead of PF. The decapsulation rule should provide: > - redirection action PF->VF > - VF port ID as redirection destination > - $VXLAN device as ingress traffic source > - the tunnel outer source IP (v4/v6), (optional) > - the tunnel outer destination IP (v4/v6), (IS A MUST) > - the tunnel local UDP port (IS A MUST, PMD looks for appropriate VTEP > with given local UDP port) > - VNI - Virtual Network Identifier (IS A MUST) > > VXLAN decap rule sample for tc utility: > > tc filter add dev $VXLAN protocol all parent ffff: flower skip_sw \ > enc_src_ip $REMOTE_IP enc_dst_ip $PF_OUTER_IP enc_key_id $VNI > \ > nc_dst_port $LOCAL_PORT \ > action tunnel_key unset action mirred egress redirect dev $VF > > VXLAN decap rule sample for testpmd: > > - Creating a flow on port ID 0 performing VXLAN decapsulation and directing > the result to port ID 4 with checking inner properties: > > flow create 0 ingress transfer pattern / > ipv4 src is $REMOTE_IP dst $PF_LOCAL_IP / > udp src is 9999 dst is $LOCAL_PORT / vxlan vni is $VNI / > eth src is 00:11:22:33:44:55 dst is $INNER_MAC / end > actions vxlan_decap / port_id id 4 / end > > The VXLAN encap/decap rules constrains (implied by current kernel support) > > - VXLAN decapsulation provided for PF->VF direction only > - VXLAN encapsulation provided for VF->PF direction only > - current implementation will support non-shared database of VTEPs > (impossible simultaneous usage of the same UDP port by several > instances of DPDK apps) > > Suggested-by: Adrien Mazarguil <adrien.mazarg...@6wind.com> > Signed-off-by: Viacheslav Ovsiienko <viachesl...@mellanox.com> >
Well done. Applied to next-net-mlx, thanks.