From: Lorand Jakab <loja...@cisco.com> LISP is an experimental layer 3 tunneling protocol. This patch adds support for LISP tunneling. Since LISP encapsulated packets do not carry an Ethernet header, it is removed before encapsulation, and added with hardcoded source and destination MAC addresses after decapsulation. The harcoded MAC chosen for this purpose is the locally administered address 02:00:00:00:00:00. Flow actions can be used to rewrite this MAC for correct reception. As such, this patch is intended to be used for static network configurations, or with a LISP capable controller.
Signed-off-by: Lorand Jakab <loja...@cisco.com> Signed-off-by: Kyle Mestery <kmest...@cisco.com> --- Makefile.am | 1 + README-lisp | 68 ++++++++ datapath/Modules.mk | 1 + datapath/linux/.gitignore | 1 + datapath/tunnel.h | 1 + datapath/vport-lisp.c | 351 ++++++++++++++++++++++++++++++++++++++++++ datapath/vport.c | 1 + datapath/vport.h | 1 + include/linux/openvswitch.h | 1 + include/openflow/nicira-ext.h | 6 +- lib/netdev-vport.c | 4 + vswitchd/vswitch.xml | 9 ++ 12 files changed, 442 insertions(+), 3 deletions(-) create mode 100644 README-lisp create mode 100644 datapath/vport-lisp.c diff --git a/Makefile.am b/Makefile.am index 328b248..b6c13a3 100644 --- a/Makefile.am +++ b/Makefile.am @@ -56,6 +56,7 @@ EXTRA_DIST = \ OPENFLOW-1.1+ \ PORTING \ README-gcov \ + README-lisp \ REPORTING-BUGS \ SubmittingPatches \ WHY-OVS \ diff --git a/README-lisp b/README-lisp new file mode 100644 index 0000000..a757364 --- /dev/null +++ b/README-lisp @@ -0,0 +1,68 @@ +Using LISP tunneling +==================== + +LISP is a layer 3 tunneling mechanism, meaning that encapsulated packets do +not carry Ethernet headers, and ARP requests shouldn't be sent over the +tunnel. Because of this, there are some additional steps required for setting +up LISP tunnels in Open vSwitch, until support for L3 tunnels will improve. + +This guide assumes a point-to-point tunnel between two VMs connected to OVS +bridges on different hypervisors connected via IPv4. Of course, more than one +VM may be connected to any of the hypervisors, using the same LISP tunnel, and +a hypervisor may be connected to several hypervisors over different LISP +tunnels. + +There are several scenarios: + + 1) the VMs have IP addresses in the same subnet and the hypervisors are also + in a single subnet (although one different from the VM's); + 2) the VMs have IP addresses in the same subnet but the hypervisors are + separated by a router; + 3) the VMs are in different subnets. + +In cases 1) and 3) ARP resolution can work as normal: ARP traffic is +configured not to go through the LISP tunnel. For case 1) ARP is able to +reach the other VM, if both OVS instances default to MAC address learning. +Case 3) requires the hypervisor be configured as the default router for the +VMs. + +In case 2) the VMs expect ARP replies from each other, but this is not +possible over a layer 3 tunnel. One solution is to have static MAC address +entries preconfigured on the VMs (e.g., `arp -f /etc/ethers` on startup on +Unix based VMs), or have the hypervisor do proxy ARP. + +On the receiving side, the packet arrives without the original MAC header. +The LISP tunneling code attaches a header with harcoded source and destination +MAC addres 02:00:00:00:00:00. This address has all bits set to 0, except the +locally administered bit, in order to avoid potential collisions with existing +allocations. In order for packets to reach their intended destination, the +destination MAC address needs to be rewritten. This can be done using the +flow table. + +See below for an example setup, and the associated flow rules to enable LISP +tunneling. + + +---+ +---+ + |VM1| |VM2| + +---+ +---+ + | | + +--[tap0]--+ +--[tap0]---+ + | | | | + [lisp0] OVS1 [eth0]-----------------[eth0] OVS2 [lisp0] + | | | | + +----------+ +-----------+ + +On each hypervisor, interfaces tap0, eth0, and lisp0 are added to a single +bridge instance, and become numbered 1, 2, and 3 respectively: + + ovs-vsctl add-br br0 + ovs-vsctl add-port br0 tap0 + ovs-vsctl add-port br0 eth0 + ovs-vsctl add-port br0 lisp0 -- set Interface lisp0 type=lisp options:remote_ip=<OVSx_IP> + +Flows on br0 are configured as follows: + + priority=3,dl_dst=02:00:00:00:00:00,action=mod_dl_dst:<VMx_MAC>,NORMAL + priority=2,in_port=1,dl_type=0x0806,action=NORMAL + priority=1,in_port=1,dl_type=0x0800,vlan_tci=0,nw_src=<EID_prefix>,action=output:3 + priority=0,action=NORMAL diff --git a/datapath/Modules.mk b/datapath/Modules.mk index 281408b..d28e29b 100644 --- a/datapath/Modules.mk +++ b/datapath/Modules.mk @@ -19,6 +19,7 @@ openvswitch_sources = \ vport-capwap.c \ vport-gre.c \ vport-internal_dev.c \ + vport-lisp.c \ vport-netdev.c \ vport-patch.c \ vport-vxlan.c diff --git a/datapath/linux/.gitignore b/datapath/linux/.gitignore index 901b2a8..cf5416b 100644 --- a/datapath/linux/.gitignore +++ b/datapath/linux/.gitignore @@ -35,6 +35,7 @@ /vport-generic.c /vport-gre.c /vport-internal_dev.c +/vport-lisp.c /vport-netdev.c /vport-patch.c /vport-vxlan.c diff --git a/datapath/tunnel.h b/datapath/tunnel.h index 12e7f19..7c89b1c 100644 --- a/datapath/tunnel.h +++ b/datapath/tunnel.h @@ -43,6 +43,7 @@ #define TNL_T_PROTO_GRE64 1 #define TNL_T_PROTO_CAPWAP 2 #define TNL_T_PROTO_VXLAN 3 +#define TNL_T_PROTO_LISP 4 /* These flags are only needed when calling tnl_find_port(). */ #define TNL_T_KEY_EXACT (1 << 10) diff --git a/datapath/vport-lisp.c b/datapath/vport-lisp.c new file mode 100644 index 0000000..558e5aa --- /dev/null +++ b/datapath/vport-lisp.c @@ -0,0 +1,351 @@ +/* + * Copyright (c) 2011 Nicira, Inc. + * Copyright (c) 2013 Cisco Systems, Inc. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of version 2 of the GNU General Public + * License as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA + * 02110-1301, USA + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include <linux/version.h> +#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,26) + +#include <linux/in.h> +#include <linux/ip.h> +#include <linux/list.h> +#include <linux/net.h> +#include <linux/udp.h> + +#include <net/icmp.h> +#include <net/ip.h> +#include <net/udp.h> + +#include "datapath.h" +#include "tunnel.h" +#include "vport.h" + +#define LISP_DST_PORT 4341 /* Well known UDP port for LISP data packets. */ + +struct lisp_net { + struct socket *lisp_rcv_socket; + int n_tunnels; +}; +static struct lisp_net lisp_net; + + +/* + * LISP encapsulation header: + * + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + * |N|L|E|V|I|flags| Nonce/Map-Version | + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + * | Instance ID/Locator Status Bits | + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + * + */ + +/** + * struct lisphdr - LISP header + * @nonce_present: Flag indicating the presence of a 24 bit nonce value. + * @lsb: Flag indicating the presence of Locator Status Bits (LSB). + * @echo_nonce: Flag indicating the use of the echo noncing mechanism. + * @map_version: Flag indicating the use of mapping versioning. + * @instance_id: Flag indicating the presence of a 24 bit Instance ID (IID). + * @rflags: 3 bits reserved for future flags. + * @nonce: 24 bit nonce value. + * @lsb_bits: 32 bit Locator Status Bits + */ +struct lisphdr { +#ifdef __LITTLE_ENDIAN_BITFIELD + __u8 rflags:3; + __u8 instance_id:1; + __u8 map_version:1; + __u8 echo_nonce:1; + __u8 lsb:1; + __u8 nonce_present:1; +#else + __u8 nonce_present:1; + __u8 lsb:1; + __u8 echo_nonce:1; + __u8 map_version:1; + __u8 instance_id:1; + __u8 rflags:3; +#endif + union { + __u8 nonce[3]; + __u8 map_version[3]; + } u1; + union { + __be32 lsb_bits; + __be32 iid; + } u2; +}; + +#define LISP_HLEN (sizeof(struct udphdr) + sizeof(struct lisphdr)) + +static inline int lisp_hdr_len(const struct tnl_mutable_config *mutable, + const struct ovs_key_ipv4_tunnel *tun_key) +{ + return LISP_HLEN; +} + +static inline struct lisphdr *lisp_hdr(const struct sk_buff *skb) +{ + return (struct lisphdr *)(udp_hdr(skb) + 1); +} + +/* Compute source port for outgoing packet. + * Currently we use the flow hash. + */ +static u16 get_src_port(struct sk_buff *skb) +{ + int low; + int high; + unsigned int range; + u32 hash = OVS_CB(skb)->flow->hash; + + inet_get_local_port_range(&low, &high); + range = (high - low) + 1; + return (((u64) hash * range) >> 32) + low; +} + +static struct sk_buff *lisp_pre_tunnel(const struct vport *vport, + const struct tnl_mutable_config *mutable, + struct sk_buff *skb) +{ + /* Pop off "inner" Ethernet header */ + skb_pull(skb, ETH_HLEN); + return skb; +} + +/* Returns the least-significant 32 bits of a __be64. */ +static __be32 be64_get_low32(__be64 x) +{ +#ifdef __BIG_ENDIAN + return (__force __be32)x; +#else + return (__force __be32)((__force u64)x >> 32); +#endif +} + +static struct sk_buff *lisp_build_header(const struct vport *vport, + const struct tnl_mutable_config *mutable, + struct dst_entry *dst, + struct sk_buff *skb, + int tunnel_hlen) +{ + struct udphdr *udph = udp_hdr(skb); + struct lisphdr *lisph = (struct lisphdr *)(udph + 1); + const struct ovs_key_ipv4_tunnel *tun_key = OVS_CB(skb)->tun_key; + __be64 out_key; + u32 flags; + + tnl_get_param(mutable, tun_key, &flags, &out_key); + + udph->dest = htons(LISP_DST_PORT); + udph->source = htons(get_src_port(skb)); + udph->check = 0; + udph->len = htons(skb->len - skb_transport_offset(skb)); + + lisph->nonce_present = 1; /* We add a nonce instead of map version */ + lisph->lsb = 0; /* No reason to set LSBs, just one RLOC */ + lisph->echo_nonce = 0; /* No echo noncing */ + lisph->map_version = 0; /* No mapping versioning, nonce instead */ + lisph->instance_id = 1; /* Store the tun_id as Instance ID */ + lisph->rflags = 1; /* Reserved flags, set to 0 */ + + lisph->u1.nonce[0] = net_random() & 0xFF; + lisph->u1.nonce[1] = net_random() & 0xFF; + lisph->u1.nonce[2] = net_random() & 0xFF; + + lisph->u2.iid = htonl(be64_get_low32(tun_key->tun_id)); + + /* + * Allow our local IP stack to fragment the outer packet even if the + * DF bit is set as a last resort. We also need to force selection of + * an IP ID here because Linux will otherwise leave it at 0 if the + * packet originally had DF set. + */ + skb->local_df = 1; + __ip_select_ident(ip_hdr(skb), dst, 0); + + return skb; +} + +/* Called with rcu_read_lock and BH disabled. */ +static int lisp_rcv(struct sock *sk, struct sk_buff *skb) +{ + struct vport *vport; + struct lisphdr *lisph; + const struct tnl_mutable_config *mutable; + struct iphdr *iph; + struct ovs_key_ipv4_tunnel tun_key; + __be64 key; + u32 tunnel_flags = 0; + struct ethhdr *ethh; + + if (unlikely(!pskb_may_pull(skb, LISP_HLEN))) + goto error; + + lisph = lisp_hdr(skb); + if (unlikely(lisph->instance_id != 1)) + goto error; + + __skb_pull(skb, LISP_HLEN); + skb_postpull_rcsum(skb, skb_transport_header(skb), LISP_HLEN); + + key = cpu_to_be64(ntohl(lisph->u2.iid)); + + iph = ip_hdr(skb); + vport = ovs_tnl_find_port(dev_net(skb->dev), iph->daddr, iph->saddr, + key, TNL_T_PROTO_LISP, &mutable); + if (unlikely(!vport)) { + icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0); + goto error; + } + + if (mutable->flags & TNL_F_IN_KEY_MATCH || !mutable->key.daddr) + tunnel_flags = OVS_TNL_F_KEY; + else + key = 0; + + /* Save outer tunnel values */ + tnl_tun_key_init(&tun_key, iph, key, tunnel_flags); + OVS_CB(skb)->tun_key = &tun_key; + + /* Add Ethernet header */ + skb_push(skb, ETH_HLEN); + + ethh = (struct ethhdr *)skb->data; + memset(ethh, 0, ETH_HLEN); + ethh->h_dest[0] = 0x02; + ethh->h_source[0] = 0x02; + ethh->h_proto = htons(ETH_P_IP); + + ovs_tnl_rcv(vport, skb); + goto out; + +error: + kfree_skb(skb); +out: + return 0; +} + +/* Arbitrary value. Irrelevant as long as it's not 0 since we set the handler. */ +#define UDP_ENCAP_LISP 7 +static int lisp_socket_init(struct net *net) +{ + int err; + struct sockaddr_in sin; + + if (lisp_net.n_tunnels) { + lisp_net.n_tunnels++; + return 0; + } + + err = sock_create_kern(AF_INET, SOCK_DGRAM, 0, + &lisp_net.lisp_rcv_socket); + if (err) + goto error; + + /* release net ref. */ + sk_change_net(lisp_net.lisp_rcv_socket->sk, net); + + sin.sin_family = AF_INET; + sin.sin_addr.s_addr = htonl(INADDR_ANY); + sin.sin_port = htons(LISP_DST_PORT); + + err = kernel_bind(lisp_net.lisp_rcv_socket, + (struct sockaddr *)&sin, + sizeof(struct sockaddr_in)); + if (err) + goto error_sock; + + udp_sk(lisp_net.lisp_rcv_socket->sk)->encap_type = UDP_ENCAP_LISP; + udp_sk(lisp_net.lisp_rcv_socket->sk)->encap_rcv = lisp_rcv; + + udp_encap_enable(); + lisp_net.n_tunnels++; + + return 0; + +error_sock: + sk_release_kernel(lisp_net.lisp_rcv_socket->sk); +error: + pr_warn("cannot register lisp protocol handler: %d\n", err); + return err; +} + +static const struct tnl_ops ovs_lisp_tnl_ops = { + .tunnel_type = TNL_T_PROTO_LISP, + .ipproto = IPPROTO_UDP, + .hdr_len = lisp_hdr_len, + .pre_tunnel = lisp_pre_tunnel, + .build_header = lisp_build_header, +}; + +static void release_socket(struct net *net) +{ + lisp_net.n_tunnels--; + if (lisp_net.n_tunnels) + return; + + sk_release_kernel(lisp_net.lisp_rcv_socket->sk); +} + +static void lisp_tnl_destroy(struct vport *vport) +{ + ovs_tnl_destroy(vport); + release_socket(ovs_dp_get_net(vport->dp)); +} + +static struct vport *lisp_tnl_create(const struct vport_parms *parms) +{ + int err; + struct vport *vport; + + err = lisp_socket_init(ovs_dp_get_net(parms->dp)); + if (err) + return ERR_PTR(err); + + vport = ovs_tnl_create(parms, &ovs_lisp_vport_ops, &ovs_lisp_tnl_ops); + if (IS_ERR(vport)) + release_socket(ovs_dp_get_net(parms->dp)); + + return vport; +} + +static int lisp_tnl_init(void) +{ + lisp_net.n_tunnels = 0; + return 0; +} + +const struct vport_ops ovs_lisp_vport_ops = { + .type = OVS_VPORT_TYPE_LISP, + .flags = VPORT_F_TUN_ID, + .init = lisp_tnl_init, + .create = lisp_tnl_create, + .destroy = lisp_tnl_destroy, + .set_addr = ovs_tnl_set_addr, + .get_name = ovs_tnl_get_name, + .get_addr = ovs_tnl_get_addr, + .get_options = ovs_tnl_get_options, + .set_options = ovs_tnl_set_options, + .send = ovs_tnl_send, +}; +#else +#warning LISP tunneling will not be available on kernels before 2.6.26 +#endif /* Linux kernel < 2.6.26 */ diff --git a/datapath/vport.c b/datapath/vport.c index a78ebfa..bf8c763 100644 --- a/datapath/vport.c +++ b/datapath/vport.c @@ -46,6 +46,7 @@ static const struct vport_ops *base_vport_ops_list[] = { #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,26) &ovs_capwap_vport_ops, &ovs_vxlan_vport_ops, + &ovs_lisp_vport_ops, #endif }; diff --git a/datapath/vport.h b/datapath/vport.h index 91f8836..24f12ae 100644 --- a/datapath/vport.h +++ b/datapath/vport.h @@ -240,5 +240,6 @@ extern const struct vport_ops ovs_gre_ft_vport_ops; extern const struct vport_ops ovs_gre64_vport_ops; extern const struct vport_ops ovs_capwap_vport_ops; extern const struct vport_ops ovs_vxlan_vport_ops; +extern const struct vport_ops ovs_lisp_vport_ops; #endif /* vport.h */ diff --git a/include/linux/openvswitch.h b/include/linux/openvswitch.h index f471fbc..b007b35 100644 --- a/include/linux/openvswitch.h +++ b/include/linux/openvswitch.h @@ -184,6 +184,7 @@ enum ovs_vport_type { OVS_VPORT_TYPE_INTERNAL, /* network device implemented by datapath */ OVS_VPORT_TYPE_FT_GRE, /* Flow based GRE tunnel. */ OVS_VPORT_TYPE_VXLAN, /* VXLAN tunnel */ + OVS_VPORT_TYPE_LISP, /* LISP tunnel */ OVS_VPORT_TYPE_PATCH = 100, /* virtual tunnel connecting two vports */ OVS_VPORT_TYPE_GRE, /* GRE tunnel */ OVS_VPORT_TYPE_CAPWAP, /* CAPWAP tunnel */ diff --git a/include/openflow/nicira-ext.h b/include/openflow/nicira-ext.h index 91c96b3..f98ab89 100644 --- a/include/openflow/nicira-ext.h +++ b/include/openflow/nicira-ext.h @@ -1579,9 +1579,9 @@ OFP_ASSERT(sizeof(struct nx_action_output_reg) == 24); /* Tunnel ID. * - * For a packet received via a GRE or VXLAN tunnel including a (32-bit) key, the - * key is stored in the low 32-bits and the high bits are zeroed. For other - * packets, the value is 0. + * For a packet received via a GRE, VXLAN or LISP tunnel including a (32-bit) + * key, the key is stored in the low 32-bits and the high bits are zeroed. For + * other packets, the value is 0. * * All zero bits, for packets not received via a keyed tunnel. * diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c index 60437b9..8020412 100644 --- a/lib/netdev-vport.c +++ b/lib/netdev-vport.c @@ -186,6 +186,9 @@ netdev_vport_get_netdev_type(const struct dpif_linux_vport *vport) case OVS_VPORT_TYPE_VXLAN: return "vxlan"; + case OVS_VPORT_TYPE_LISP: + return "lisp"; + case OVS_VPORT_TYPE_FT_GRE: case __OVS_VPORT_TYPE_MAX: break; @@ -915,6 +918,7 @@ netdev_vport_register(void) TUNNEL_CLASS("ipsec_gre64", OVS_VPORT_TYPE_GRE64), TUNNEL_CLASS("capwap", OVS_VPORT_TYPE_CAPWAP), TUNNEL_CLASS("vxlan", OVS_VPORT_TYPE_VXLAN), + TUNNEL_CLASS("lisp", OVS_VPORT_TYPE_LISP), { OVS_VPORT_TYPE_PATCH, { "patch", VPORT_FUNCTIONS(NULL, NULL) }, diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index 18643c2..295e41c 100644 --- a/vswitchd/vswitch.xml +++ b/vswitchd/vswitch.xml @@ -1278,6 +1278,15 @@ </p> </dd> + <dt><code>lisp</code></dt> + <dd> + A layer 3 tunnel over the experimental, UDP-based LISP + protocol described at + <code>http://tools.ietf.org/html/draft-ietf-lisp-24</code>. + LISP is currently supported only with the Linux kernel datapath + with kernel version 2.6.26 or later. + </dd> + <dt><code>patch</code></dt> <dd> A pair of virtual devices that act as a patch cable. -- 1.7.11.7 _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev