On Nov 28, 2012, at 8:43 PM, Jesse Gross <[email protected]> wrote: > On Wed, Nov 28, 2012 at 2:53 PM, Kyle Mestery <[email protected]> wrote: >> Note: v5 of this patch addresses comments from Jesse >> and Chris. >> >> Note: v4 of this patch removes VXLAN over IPSEC support, >> per an offline conversation with Jesse. >> >> Add support for VXLAN tunnels to Open vSwitch. Add support >> for setting the destination UDP port on a per-port basis. >> This is done by adding a "dst_port" parameter to the port >> configuration. This is only applicable currently to VXLAN >> tunnels. >> >> Please note this currently does not implement any sort of multicast >> learning. With this patch, VXLAN tunnels must be configured similar >> to GRE tunnels (e.g. point to point). A subsequent patch will implement >> a VXLAN control plane in userspace to handle multicast learning. >> >> This patch set is based on one posted by Ben Pfaff on Oct. 12, 2011 >> to the ovs-dev mailing list: >> >> http://openvswitch.org/pipermail/dev/2011-October/012051.html >> >> The patch has been maintained, updated, and freshened by me and a >> version of it is available at the following github repository: >> >> https://github.com/mestery/ovs-vxlan/tree/vxlan >> >> I've tested this patch with multiple VXLAN tunnels between hosts >> using different UDP port numbers. Performance is on par (though >> slightly faster) than comparable GRE tunnels. >> >> See the following IETF draft for additional information about VXLAN: >> http://tools.ietf.org/html/draft-mahalingam-dutt-dcops-vxlan-02 >> >> Signed-off-by: Kyle Mestery <[email protected]> > > Thanks Kyle. I noticed a couple of small things, otherwise looks good. > Thanks for the review Jesse. I'm addressing these now, will hopefully send out a new patch today or tomorrow morning.
Kyle > For the commit message, can you use --- at the bottom to separate out > the comments about the revision? Otherwise, if I applied this as-is > it would include the revision history, which doesn't really need to be > in the commit log since the earlier version won't be there either. > >> diff --git a/NEWS b/NEWS >> index bb80beb..f6b2f0d 100644 >> --- a/NEWS >> +++ b/NEWS >> @@ -1,6 +1,7 @@ >> post-v1.9.0 >> -------------------- >> - >> + - New support for the VXLAN tunnel protocol (see the IETF draft here: >> + http://tools.ietf.org/html/draft-mahalingam-dutt-dcops-vxlan-02). > > Can you keep the double blank lines that separate the versions? > >> diff --git a/datapath/vport-vxlan.c b/datapath/vport-vxlan.c >> new file mode 100644 >> index 0000000..8f79991 >> --- /dev/null >> +++ b/datapath/vport-vxlan.c >> +/* Protected by RTNL lock. */ >> +#define VXLAN_SOCK_HASH_BUCKETS 32 >> +static struct hlist_head vxlan_ports[VXLAN_SOCK_HASH_BUCKETS]; > > Given that in most cases we'll only have a single destination port and > this isn't part of any fast path, 32 buckets still seems overkill to > me. Is there a reason to keep it as a hash table at all? > >> +static struct vxlan_port *vxlan_port_exists(struct net *net, u16 port) >> +{ >> + struct hlist_head *bucket = vxlan_hash_bucket(net, port); >> + struct vxlan_port *vxlan_port; >> + struct hlist_node *node; >> + >> + hlist_for_each_entry(vxlan_port, node, bucket, hash_node) { >> + if (vxlan_port->port == port) > > We also should check the net. > >> +static void vxlan_tunnel_release(struct vxlan_port *vxlan_port) >> +{ >> + if (vxlan_port->count-- == 0) { > > Shouldn't this be predecrement? > >> + /* Release old socket */ >> + sock_release(vxlan_port->vxlan_rcv_socket); >> + hlist_del(&vxlan_port->hash_node); >> + kfree(vxlan_port); >> + } >> +} > > It would be nice to make the the functions that deal with creation and > deletion of sockets symmetric to each other in regards to allocation > of memory, refcounting, and socket creation. > >> +static int vxlan_tunnel_setup(struct net *net, const char *linkname, >> + struct nlattr *options, >> + struct vxlan_port **vxport) >> +{ >> + struct nlattr *a; >> + int err; >> + u16 dst_port; >> + struct vxlan_port *vxlan_port; >> + >> + if (!options) { >> + err = -EINVAL; >> + goto out; >> + } >> + >> + a = nla_find_nested(options, OVS_TUNNEL_ATTR_DST_PORT); >> + if (a) > > We also should check that the size is u16 here. > >> + /* Verify if we already have a socket created for this port */ >> + vxlan_port = vxlan_port_exists(net, dst_port); >> + if (vxlan_port) { >> + if (vxlan_port->port == dst_port) { >> + vxlan_port->count++; >> + err = 0; >> + goto out; >> + } >> + >> + /* Release old socket */ >> + vxlan_tunnel_release(vxlan_port); > > I don't understand the second check for dst_port and the possibility > of releasing the old port. If the result of vxlan_port_exists is > non-NULL shouldn't that be enough? > >> + } >> + >> + /* Add a new socket for this port */ >> + vxlan_port = kzalloc(sizeof(struct vxlan_port), GFP_KERNEL); >> + if (!vxlan_port) { >> + err = -ENOMEM; >> + goto out; >> + } >> + >> + vxlan_port->port = dst_port; >> + vxlan_port->count = 1; >> + hlist_add_head(&vxlan_port->hash_node, >> + vxlan_hash_bucket(net, dst_port)); >> + >> + err = vxlan_socket_init(vxlan_port, net); >> + if (err) >> + goto error; >> + >> + vxport = &vxlan_port; >> +out: >> + return err; >> +error: >> + vxport = NULL; > > These two references to vxport look wrong to me - I think they should > both be *vxport. > >> + hlist_del(&vxlan_port->hash_node); >> + kfree(vxlan_port); >> + goto out; > > The exit path is somewhat confusing in the way that it jumps around. > Usually we have the error handler(s) first and eventually get to the > success path, like we were unwinding the stack. You also could start > with vxlan_port being NULL and then assign it as part of the success > path to vxport. > >> +static void vxlan_tnl_destroy(struct vport *vport) >> +{ >> + struct vxlan_port *vxlan_port; >> + struct tnl_vport *tnl_vport = tnl_vport_priv(vport); >> + struct tnl_mutable_config *config; >> + >> + config = rtnl_dereference(tnl_vport->mutable); >> + >> + vxlan_port = vxlan_port_exists(ovs_dp_get_net(vport->dp), >> + config->dst_port); >> + if (!vxlan_port) >> + goto out; >> + >> + if (!--vxlan_port->count) { >> + vxlan_tunnel_release(vxlan_port); >> + } > > I think the refcount gets decremented both here and in vxlan_tunnel_release(). > >> +static struct vport *vxlan_tnl_create(const struct vport_parms *parms) >> +{ >> + int err; >> + struct vport *vport; >> + struct vxlan_port *vxlan_port = NULL; >> + >> + err = vxlan_tunnel_setup(ovs_dp_get_net(parms->dp), parms->name, >> + parms->options, &vxlan_port); >> + vport = ovs_tnl_create(parms, &ovs_vxlan_vport_ops, >> &ovs_vxlan_tnl_ops); >> + >> + if (!vport) >> + vxlan_tunnel_release(vxlan_port); > > Shouldn't we return the error and stop immediately if vxlan_tunnel_setup > fails? > >> diff --git a/include/openvswitch/tunnel.h b/include/openvswitch/tunnel.h >> index 42c3621..b4e57e2 100644 >> --- a/include/openvswitch/tunnel.h >> +++ b/include/openvswitch/tunnel.h > [...] >> +/* Default to the OTV port, per the VXLAN IETF draft. */ >> +#define VXLAN_DST_PORT 8472 > > I would put this constant at the top of lib/vport-netdev.c. This file > is part of the user/kernel interface and I'd like to keep the port > number entirely in userspace. > >> diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c >> index 5171171..9452d60 100644 >> --- a/lib/netdev-vport.c >> +++ b/lib/netdev-vport.c >> @@ -835,6 +852,11 @@ unparse_tunnel_config(const char *name OVS_UNUSED, >> const char *type OVS_UNUSED, >> smap_add_format(args, "tos", "0x%x", tos); >> } >> >> + if (a[OVS_TUNNEL_ATTR_DST_PORT]) { >> + uint16_t dst_port = nl_attr_get_be16(a[OVS_TUNNEL_ATTR_DST_PORT]); > > On the right side where we get the attribute we'll want to use u16 as well. > >> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml >> index c2786a5..9278e42 100644 >> --- a/vswitchd/vswitch.xml >> +++ b/vswitchd/vswitch.xml >> @@ -1247,6 +1246,23 @@ >> February 2013. >> </dd> >> >> + <dt><code>vxlan</code></dt> >> + <dd> >> + <p> >> + An Ethernet tunnel over the experimental, UDP-based VXLAN >> + protocol described at >> + >> <code>http://tools.ietf.org/html/draft-mahalingam-dutt-dcops-vxlan-02</code>. >> + VXLAN is currently supported only with the Linux kernel >> datapath >> + with kernel version 2.6.26 or later. >> + </p> >> + <p> >> + As an experimental protocol, VXLAN has no officially assigned >> UDP >> + port. Open vSwitch currently uses UDP destination port 8472. >> + The source port used for VXLAN traffic varies on a per-flow >> basis >> + between 32768 and 65535 to allow load balancing. > > It's no longer quite true that it's the upper 15-bits since we're > using the ephemeral ports now. _______________________________________________ dev mailing list [email protected] http://openvswitch.org/mailman/listinfo/dev
