On Wed, Nov 28, 2012 at 2:53 PM, Kyle Mestery <[email protected]> wrote:
> Note: v5 of this patch addresses comments from Jesse
> and Chris.
>
> Note: v4 of this patch removes VXLAN over IPSEC support,
> per an offline conversation with Jesse.
>
> Add support for VXLAN tunnels to Open vSwitch. Add support
> for setting the destination UDP port on a per-port basis.
> This is done by adding a "dst_port" parameter to the port
> configuration. This is only applicable currently to VXLAN
> tunnels.
>
> Please note this currently does not implement any sort of multicast
> learning. With this patch, VXLAN tunnels must be configured similar
> to GRE tunnels (e.g. point to point). A subsequent patch will implement
> a VXLAN control plane in userspace to handle multicast learning.
>
> This patch set is based on one posted by Ben Pfaff on Oct. 12, 2011
> to the ovs-dev mailing list:
>
> http://openvswitch.org/pipermail/dev/2011-October/012051.html
>
> The patch has been maintained, updated, and freshened by me and a
> version of it is available at the following github repository:
>
> https://github.com/mestery/ovs-vxlan/tree/vxlan
>
> I've tested this patch with multiple VXLAN tunnels between hosts
> using different UDP port numbers. Performance is on par (though
> slightly faster) than comparable GRE tunnels.
>
> See the following IETF draft for additional information about VXLAN:
> http://tools.ietf.org/html/draft-mahalingam-dutt-dcops-vxlan-02
>
> Signed-off-by: Kyle Mestery <[email protected]>

Thanks Kyle.  I noticed a couple of small things, otherwise looks good.

For the commit message, can you use --- at the bottom to separate out
the comments about the revision?  Otherwise, if I applied this as-is
it would include the revision history, which doesn't really need to be
in the commit log since the earlier version won't be there either.

> diff --git a/NEWS b/NEWS
> index bb80beb..f6b2f0d 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -1,6 +1,7 @@
>  post-v1.9.0
>  --------------------
> -
> +    - New support for the VXLAN tunnel protocol (see the IETF draft here:
> +      http://tools.ietf.org/html/draft-mahalingam-dutt-dcops-vxlan-02).

Can you keep the double blank lines that separate the versions?

> diff --git a/datapath/vport-vxlan.c b/datapath/vport-vxlan.c
> new file mode 100644
> index 0000000..8f79991
> --- /dev/null
> +++ b/datapath/vport-vxlan.c
> +/* Protected by RTNL lock. */
> +#define VXLAN_SOCK_HASH_BUCKETS 32
> +static struct hlist_head vxlan_ports[VXLAN_SOCK_HASH_BUCKETS];

Given that in most cases we'll only have a single destination port and
this isn't part of any fast path, 32 buckets still seems overkill to
me.  Is there a reason to keep it as a hash table at all?

> +static struct vxlan_port *vxlan_port_exists(struct net *net, u16 port)
> +{
> +       struct hlist_head *bucket = vxlan_hash_bucket(net, port);
> +       struct vxlan_port *vxlan_port;
> +       struct hlist_node *node;
> +
> +       hlist_for_each_entry(vxlan_port, node, bucket, hash_node) {
> +               if (vxlan_port->port == port)

We also should check the net.

> +static void vxlan_tunnel_release(struct vxlan_port *vxlan_port)
> +{
> +       if (vxlan_port->count-- == 0) {

Shouldn't this be predecrement?

> +               /* Release old socket */
> +               sock_release(vxlan_port->vxlan_rcv_socket);
> +               hlist_del(&vxlan_port->hash_node);
> +               kfree(vxlan_port);
> +       }
> +}

It would be nice to make the the functions that deal with creation and
deletion of sockets symmetric to each other in regards to allocation
of memory, refcounting, and socket creation.

> +static int vxlan_tunnel_setup(struct net *net, const char *linkname,
> +                             struct nlattr *options,
> +                             struct vxlan_port **vxport)
> +{
> +       struct nlattr *a;
> +       int err;
> +       u16 dst_port;
> +       struct vxlan_port *vxlan_port;
> +
> +       if (!options) {
> +               err = -EINVAL;
> +               goto out;
> +       }
> +
> +       a = nla_find_nested(options, OVS_TUNNEL_ATTR_DST_PORT);
> +       if (a)

We also should check that the size is u16 here.

> +       /* Verify if we already have a socket created for this port */
> +       vxlan_port = vxlan_port_exists(net, dst_port);
> +       if (vxlan_port) {
> +               if (vxlan_port->port == dst_port) {
> +                       vxlan_port->count++;
> +                       err = 0;
> +                       goto out;
> +               }
> +
> +               /* Release old socket */
> +               vxlan_tunnel_release(vxlan_port);

I don't understand the second check for dst_port and the possibility
of releasing the old port.  If the result of vxlan_port_exists is
non-NULL shouldn't that be enough?

> +       }
> +
> +       /* Add a new socket for this port */
> +       vxlan_port = kzalloc(sizeof(struct vxlan_port), GFP_KERNEL);
> +       if (!vxlan_port) {
> +               err = -ENOMEM;
> +               goto out;
> +       }
> +
> +       vxlan_port->port = dst_port;
> +       vxlan_port->count = 1;
> +       hlist_add_head(&vxlan_port->hash_node,
> +                      vxlan_hash_bucket(net, dst_port));
> +
> +       err = vxlan_socket_init(vxlan_port, net);
> +       if (err)
> +               goto error;
> +
> +       vxport = &vxlan_port;
> +out:
> +       return err;
> +error:
> +       vxport = NULL;

These two references to vxport look wrong to me - I think they should
both be *vxport.

> +       hlist_del(&vxlan_port->hash_node);
> +       kfree(vxlan_port);
> +       goto out;

The exit path is somewhat confusing in the way that it jumps around.
Usually we have the error handler(s) first and eventually get to the
success path, like we were unwinding the stack.  You also could start
with vxlan_port being NULL and then assign it as part of the success
path to vxport.

> +static void vxlan_tnl_destroy(struct vport *vport)
> +{
> +       struct vxlan_port *vxlan_port;
> +       struct tnl_vport *tnl_vport = tnl_vport_priv(vport);
> +       struct tnl_mutable_config *config;
> +
> +       config = rtnl_dereference(tnl_vport->mutable);
> +
> +       vxlan_port = vxlan_port_exists(ovs_dp_get_net(vport->dp),
> +                                        config->dst_port);
> +       if (!vxlan_port)
> +               goto out;
> +
> +       if (!--vxlan_port->count) {
> +               vxlan_tunnel_release(vxlan_port);
> +       }

I think the refcount gets decremented both here and in vxlan_tunnel_release().

> +static struct vport *vxlan_tnl_create(const struct vport_parms *parms)
> +{
> +       int err;
> +       struct vport *vport;
> +       struct vxlan_port *vxlan_port = NULL;
> +
> +       err = vxlan_tunnel_setup(ovs_dp_get_net(parms->dp), parms->name,
> +                                               parms->options, &vxlan_port);
> +       vport = ovs_tnl_create(parms, &ovs_vxlan_vport_ops, 
> &ovs_vxlan_tnl_ops);
> +
> +       if (!vport)
> +               vxlan_tunnel_release(vxlan_port);

Shouldn't we return the error and stop immediately if vxlan_tunnel_setup fails?

> diff --git a/include/openvswitch/tunnel.h b/include/openvswitch/tunnel.h
> index 42c3621..b4e57e2 100644
> --- a/include/openvswitch/tunnel.h
> +++ b/include/openvswitch/tunnel.h
[...]
> +/* Default to the OTV port, per the VXLAN IETF draft. */
> +#define VXLAN_DST_PORT 8472

I would put this constant at the top of lib/vport-netdev.c.  This file
is part of the user/kernel interface and I'd like to keep the port
number entirely in userspace.

> diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c
> index 5171171..9452d60 100644
> --- a/lib/netdev-vport.c
> +++ b/lib/netdev-vport.c
> @@ -835,6 +852,11 @@ unparse_tunnel_config(const char *name OVS_UNUSED, const 
> char *type OVS_UNUSED,
>          smap_add_format(args, "tos", "0x%x", tos);
>      }
>
> +    if (a[OVS_TUNNEL_ATTR_DST_PORT]) {
> +        uint16_t dst_port = nl_attr_get_be16(a[OVS_TUNNEL_ATTR_DST_PORT]);

On the right side where we get the attribute we'll want to use u16 as well.

> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> index c2786a5..9278e42 100644
> --- a/vswitchd/vswitch.xml
> +++ b/vswitchd/vswitch.xml
> @@ -1247,6 +1246,23 @@
>              February 2013.
>            </dd>
>
> +          <dt><code>vxlan</code></dt>
> +          <dd>
> +           <p>
> +             An Ethernet tunnel over the experimental, UDP-based VXLAN
> +             protocol described at
> +             
> <code>http://tools.ietf.org/html/draft-mahalingam-dutt-dcops-vxlan-02</code>.
> +             VXLAN is currently supported only with the Linux kernel datapath
> +             with kernel version 2.6.26 or later.
> +           </p>
> +           <p>
> +             As an experimental protocol, VXLAN has no officially assigned 
> UDP
> +             port.  Open vSwitch currently uses UDP destination port 8472.
> +             The source port used for VXLAN traffic varies on a per-flow 
> basis
> +             between 32768 and 65535 to allow load balancing.

It's no longer quite true that it's the upper 15-bits since we're
using the ephemeral ports now.
_______________________________________________
dev mailing list
[email protected]
http://openvswitch.org/mailman/listinfo/dev

Reply via email to