On Wed, Sep 5, 2012 at 2:58 PM, Kyle Mestery <[email protected]> wrote:
> This is a first pass at providing a tun_key which can be
> used as the basis for flow-based tunnelling. The tun_key
> includes and replaces the tun_id in both struct ovs_skb_cb
> and struct sw_tun_key.
>
> This patch allows all existing tun_id behaviour to still work. Existing
> users of tun_id are redirected to tun_key->tun_id to retain compatibility.
> However, when the userspace code is updated to make use of the new tun_key,
> the old behaviour will be deprecated and removed.
>
> NOTE: With these changes, the tunneling code no longer assumes input and
> output keys are symmetric.  If they are not, PMTUD needs to be disabled
> for tunneling to work.
>
> Signed-off-by: Kyle Mestery <[email protected]>
> Cc: Simon Horman <[email protected]>
> Cc: Jesse Gross <[email protected]>

A couple high level Linux kernel style comments:
 * I noticed a couple places that use spaces instead of tabs.
 * The multiline comment style used in networking (although not
currently respected everywhere) is:
 /* Comment
  * comment
  */

> diff --git a/NEWS b/NEWS
> index cbc5c58..0abf2fa 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -1,5 +1,7 @@
>  post-v1.8.0
>  ------------------------
> +    - The tunneling code no longer assumes input and output keys are 
> symmetric.
> +      If they are not, PMTUD needs to be disabled for tunneling to work.

I would probably note that this only applies in the case of flow-based keys.

> diff --git a/datapath/actions.c b/datapath/actions.c
> index ec9b595..31f6bd6 100644
> --- a/datapath/actions.c
> +++ b/datapath/actions.c
> @@ -377,6 +381,16 @@ static int do_execute_actions(struct datapath *dp, 
> struct sk_buff *skb,
>         int prev_port = -1;
>         const struct nlattr *a;
>         int rem;
> +       struct ovs_key_ipv4_tunnel tun_key;
> +
> +       /*
> +        * If tun_key is NULL for this skb, point it at one on the stack for
> +        * action processing and output. This can disappear once we drop 
> support
> +        * for setting tun_id outside of tun_key.
> +        */
> +        memset(&tun_key, 0, sizeof(struct ovs_key_ipv4_tunnel));
> +        if (OVS_CB(skb)->tun_key == NULL)
> +               OVS_CB(skb)->tun_key = &tun_key;

I think ideally we want to push this down as far as possible.  struct
ovs_key_ipv4_tunnel needs to be on the stack in do_execute_actions()
so that it is in scope when we output but it's best to do the
memsetting and assignment only when (and if) we actually set the
tun_id.  We can just pass a pointer into execute_set_action().

> diff --git a/datapath/flow.c b/datapath/flow.c
> index d07337c..3b85c62 100644
> --- a/datapath/flow.c
> +++ b/datapath/flow.c
> @@ -1023,10 +1026,28 @@ int ovs_flow_from_nlattrs(struct sw_flow_key *swkey, 
> int *key_lenp,
>         }
>
>         if (attrs & (1ULL << OVS_KEY_ATTR_TUN_ID)) {
> -               swkey->phy.tun_id = nla_get_be64(a[OVS_KEY_ATTR_TUN_ID]);
> +               tun_id = nla_get_be64(a[OVS_KEY_ATTR_TUN_ID]);
>                 attrs &= ~(1ULL << OVS_KEY_ATTR_TUN_ID);
>         }
>
> +       if (attrs & (1ULL << OVS_KEY_ATTR_IPV4_TUNNEL)) {
> +               struct ovs_key_ipv4_tunnel *tun_key;
> +               tun_key = nla_data(a[OVS_KEY_ATTR_IPV4_TUNNEL]);
> +
> +               /*
> +                * If both OVS_KEY_ATTR_TUN_ID and OVS_KEY_ATTR_IPV4_TUNNEL 
> are
> +                * present, ensure they are consistent or error out.
> +                */
> +               if (tun_id != 0) {
> +                       /* Verify they match */
> +                       if (tun_id != tun_key->tun_id)
> +                               return -EINVAL;
> +               }
> +
> +               swkey->tun.tun_key = *tun_key;
> +               attrs &= ~(1ULL << OVS_KEY_ATTR_IPV4_TUNNEL);
> +       }

I think if only OVS_KEY_ATTR_TUN_ID is sent then this will result in
tun_key never getting set at all.  This shouldn't happen in practice
because on flow setup we always will send both together and they
should be echoed back together.  It doesn't seem right to silently
ignore one though.

Here's a different option - we could enforce that both must come
together and where there is overlap the information is consistent.
This should be easier to check and we can avoid ambiguities around the
value zero.  It also treats them as one, which they effectively are.
On both the sent and receive sides I would combine the if blocks to
really make it explicit that they go together.

> +int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port,
> +                                  struct ovs_key_ipv4_tunnel *tun_key,
>                                    const struct nlattr *attr)
>  {
>         const struct nlattr *nla;
>         int rem;
>
>         *in_port = DP_MAX_PORTS;
> -       *tun_id = 0;
> +       memset(tun_key, 0, sizeof(struct ovs_key_ipv4_tunnel));

Can you use sizeof(*tun_key) here?

> @@ -1185,7 +1207,12 @@ int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 
> *in_port, __be64 *tun_id,
>                                 break;
>
>                         case OVS_KEY_ATTR_TUN_ID:
> -                               *tun_id = nla_get_be64(nla);
> +                               tun_key->tun_id = nla_get_be64(nla);
> +                               break;
> +
> +                       case OVS_KEY_ATTR_IPV4_TUNNEL:
> +                               memcpy(tun_key, nla_data(nla),
> +                                      sizeof(*tun_key));

We should enforce consistency here as well (this one is probably more
likely to be broken anyways because flow setups just echo back what
the kernel supplied but this is generated "by hand").  In this case we
can't enforce that both come together.

> diff --git a/datapath/flow.h b/datapath/flow.h
> index 02c563a..5d4fcde 100644
> --- a/datapath/flow.h
> +++ b/datapath/flow.h
> @@ -42,7 +42,6 @@ struct sw_flow_actions {
>
>  struct sw_flow_key {
>         struct {
> -               __be64  tun_id;         /* Encapsulating tunnel ID. */
>                 u32     priority;       /* Packet QoS priority. */
>                 u16     in_port;        /* Input switch port (or 
> DP_MAX_PORTS). */
>         } phy;
> @@ -92,6 +91,9 @@ struct sw_flow_key {
>                         } nd;
>                 } ipv6;
>         };
> +        struct {
> +               struct ovs_key_ipv4_tunnel tun_key;  /* Encapsulating tunnel 
> key. */
> +       } tun;
>  };

The solution for this is a little more complicated because for
efficiency we only match on as much of the flow struct as is
necessary.  Since we currently calculate how far we need to search
(i.e. TCP/IPv4) without taking into account a tunnel header at the end
this will never match on the tunnel.  The other less serious problem
is that this reduces the effectiveness of the partial length match
when tunneling because IPv4 packets will have to match over a series
of zeros where the IPv6 header exists.  Basically we want to stick
tunnel headers at the end of the current match, where ever that might
be.

One way of doing this is to add a copy of the tun_key to each union
that could possibly be the end of the struct.  We essentially do that
with the TCP/UDP ports which could follow either a IPv4 or IPv6
header.  However in the case of tunnels there are many more possible
end locations (I count 7) and whereas L4 ports logically follows the
L3 header that's not really the case with tunnels, we're just moving
them around because we can and the benefit is large.

I think what I would do is to make struct tun completely independent
of struct sw_flow_key and then have it "float" over where ever we
decide is the end of the flow struct.  It's a little messy if you want
random access to it because you have to compute the size of the
earlier members but in practice don't think we ever do so it shouldn't
be too bad.

> diff --git a/datapath/tunnel.c b/datapath/tunnel.c
> index d651c11..0687f80 100644
> --- a/datapath/tunnel.c
> +++ b/datapath/tunnel.c
> @@ -799,10 +786,10 @@ static void create_tunnel_header(const struct vport 
> *vport,
>         iph->ihl        = sizeof(struct iphdr) >> 2;
>         iph->frag_off   = htons(IP_DF);
>         iph->protocol   = tnl_vport->tnl_ops->ipproto;
> -       iph->tos        = mutable->tos;
> +       iph->tos        = mutable->tos;
>         iph->daddr      = rt->rt_dst;
>         iph->saddr      = rt->rt_src;
> -       iph->ttl        = mutable->ttl;
> +       iph->ttl        = mutable->ttl;

It looks like these might have been converted from tabs to spaces.

> @@ -1029,7 +1017,8 @@ static struct rtable *__find_route(const struct 
> tnl_mutable_config *mutable,
>
>  static struct rtable *find_route(struct vport *vport,
>                                  const struct tnl_mutable_config *mutable,
> -                                u8 tos, struct tnl_cache **cache)
> +                                u8 tos, __be32 daddr, __be32 saddr,
> +                                struct tnl_cache **cache)

Can we make the argument list here match __find_route (where
possible)?  In particular, it's confusing that daddr and saddr are in
opposite order and no static checker will be able to catch those
mistakes.

> @@ -1219,11 +1213,23 @@ int ovs_tnl_send(struct vport *vport, struct sk_buff 
> *skb)
>
>         if (mutable->flags & TNL_F_TOS_INHERIT)
>                 tos = inner_tos;
> +       else if (OVS_CB(skb)->tun_key && OVS_CB(skb)->tun_key->ipv4_tos != 0)
> +               tos = OVS_CB(skb)->tun_key->ipv4_tos;
>         else
>                 tos = mutable->tos;
>
> +       if (OVS_CB(skb)->tun_key && OVS_CB(skb)->tun_key->ipv4_dst != 0)
> +               daddr = OVS_CB(skb)->tun_key->ipv4_dst;
> +       else
> +               daddr = mutable->key.daddr;
> +
> +       if (OVS_CB(skb)->tun_key && OVS_CB(skb)->tun_key->ipv4_src != 0)
> +               saddr = OVS_CB(skb)->tun_key->ipv4_src;
> +       else
> +               saddr = mutable->key.saddr;

For these checks I think we actually want to always check for
tun_key->ipv4_dst != 0 as indicator for which type of tunnel key we
are using rather than to check for the actual value.  For example, in
the case of ipv4_tos it's quite possible that 0 is the actual value
that is intended to be set.  If it is set then I would use it,
overriding any other settings.

> diff --git a/datapath/tunnel.h b/datapath/tunnel.h
> index 1924017..78a4d14 100644
> --- a/datapath/tunnel.h
> +++ b/datapath/tunnel.h
> @@ -286,4 +286,14 @@ static inline struct tnl_vport *tnl_vport_priv(const 
> struct vport *vport)
>         return vport_priv(vport);
>  }
>
> +static inline void tun_key_init(struct ovs_key_ipv4_tunnel *tun_key,
> +                                const struct iphdr *iph, __be64 tun_id)

Can you use tnl_ as the prefix here like the other functions in this file?

> diff --git a/datapath/vport-capwap.c b/datapath/vport-capwap.c
> index 05a099d..1e08d5a 100644
> --- a/datapath/vport-capwap.c
> +++ b/datapath/vport-capwap.c
> @@ -220,7 +220,7 @@ static struct sk_buff *capwap_update_header(const struct 
> vport *vport,
>                 struct capwaphdr_wsi *wsi = (struct capwaphdr_wsi *)(cwh + 1);
>                 struct capwaphdr_wsi_key *opt = (struct capwaphdr_wsi_key 
> *)(wsi + 1);
>
> -               opt->key = OVS_CB(skb)->tun_id;
> +               opt->key = OVS_CB(skb)->tun_key->tun_id;

In theory it's possible that tun_key is NULL (at the moment we always
initialize to something but that's not the long term goal), which
means that this will crash.  While we could add a check here I think
it's probably better to do it in the generic tunneling code since once
we switch over to flow based tunneling it will be required that you
set the tun_key (for the IP addresses if for no other reason).  What I
would do is add a check for tun_key == NULL in tnl_send().  If it is
NULL we can add a zeroed out dummy since there isn't a requirement
that the user call set tun_id yet.

> diff --git a/include/linux/openvswitch.h b/include/linux/openvswitch.h
> index f5c9cca..f9f0c66 100644
> --- a/include/linux/openvswitch.h
> +++ b/include/linux/openvswitch.h
> @@ -278,7 +278,8 @@ enum ovs_key_attr {
>         OVS_KEY_ATTR_ICMPV6,    /* struct ovs_key_icmpv6 */
>         OVS_KEY_ATTR_ARP,       /* struct ovs_key_arp */
>         OVS_KEY_ATTR_ND,        /* struct ovs_key_nd */
> -       OVS_KEY_ATTR_TUN_ID = 63, /* be64 tunnel ID */
> +       OVS_KEY_ATTR_IPV4_TUNNEL,  /* struct ovs_key_ipv4_tunnel */
> +       OVS_KEY_ATTR_TUN_ID = 63,  /* be64 tunnel ID */

I would assign OVS_KEY_ATTR_IPV4_TUNNEL the value of 62 for the time
being until we are ready to lock things down.

> diff --git a/lib/odp-util.c b/lib/odp-util.c
> index 901dac3..5a17e11 100644
> --- a/lib/odp-util.c
> +++ b/lib/odp-util.c
> @@ -99,13 +99,14 @@ ovs_key_attr_to_string(enum ovs_key_attr attr)
>      case OVS_KEY_ATTR_ETHERTYPE: return "eth_type";
>      case OVS_KEY_ATTR_IPV4: return "ipv4";
>      case OVS_KEY_ATTR_IPV6: return "ipv6";
> +    case OVS_KEY_ATTR_TUN_ID: return "tun_id";
> +    case OVS_KEY_ATTR_IPV4_TUNNEL: return "ipv4_tunnel";
>      case OVS_KEY_ATTR_TCP: return "tcp";
>      case OVS_KEY_ATTR_UDP: return "udp";
>      case OVS_KEY_ATTR_ICMP: return "icmp";
>      case OVS_KEY_ATTR_ICMPV6: return "icmpv6";
>      case OVS_KEY_ATTR_ARP: return "arp";
>      case OVS_KEY_ATTR_ND: return "nd";
> -    case OVS_KEY_ATTR_TUN_ID: return "tun_id";
>
>      case __OVS_KEY_ATTR_MAX:
>      default:
> @@ -601,13 +602,14 @@ odp_flow_key_attr_len(uint16_t type)
>      switch ((enum ovs_key_attr) type) {
>      case OVS_KEY_ATTR_ENCAP: return -2;
>      case OVS_KEY_ATTR_PRIORITY: return 4;
> -    case OVS_KEY_ATTR_TUN_ID: return 8;
>      case OVS_KEY_ATTR_IN_PORT: return 4;
>      case OVS_KEY_ATTR_ETHERNET: return sizeof(struct ovs_key_ethernet);
>      case OVS_KEY_ATTR_VLAN: return sizeof(ovs_be16);
>      case OVS_KEY_ATTR_ETHERTYPE: return 2;
>      case OVS_KEY_ATTR_IPV4: return sizeof(struct ovs_key_ipv4);
>      case OVS_KEY_ATTR_IPV6: return sizeof(struct ovs_key_ipv6);
> +    case OVS_KEY_ATTR_TUN_ID: return 8;
> +    case OVS_KEY_ATTR_IPV4_TUNNEL: return sizeof(struct ovs_key_ipv4_tunnel);

Since the IP keys actually refer to the inner header I would logically
group the tunnel attributes together where they are in the second
block - after priority.

> diff --git a/lib/odp-util.h b/lib/odp-util.h
> index 16f2b15..250302e 100644
> --- a/lib/odp-util.h
> +++ b/lib/odp-util.h
> @@ -90,12 +91,12 @@ int odp_actions_from_string(const char *, const struct 
> simap *port_names,
>   *  OVS_KEY_ATTR_ICMPV6        2     2     4      8
>   *  OVS_KEY_ATTR_ND           28    --     4     32
>   *  -------------------------------------------------
> - *  total                                       156
> + *  total                                       184
>   *
>   * We include some slack space in case the calculation isn't quite right or 
> we
>   * add another field and forget to adjust this value.
>   */
> -#define ODPUTIL_FLOW_KEY_BYTES 200
> +#define ODPUTIL_FLOW_KEY_BYTES 204

Since this is just an arbitrary amount of padding we can probably keep
it at 200 for the time being as a nice round number.
_______________________________________________
dev mailing list
[email protected]
http://openvswitch.org/mailman/listinfo/dev

Reply via email to