On May 24, 2012, at 2:08 AM, Simon Horman wrote:

> this is a first pass at providing a tun_key which can be used
> as the basis for flow-based tunnelling. The tun_key includes and
> replaces the tun_id in both struct ovs_skb_cb and struct sw_tun_key.
> 
> In ovs_skb_cb tun_key is a pointer as it is envisaged that it will grow
> when support for IPv6 to an extent that inlining the structure will result
> in ovs_skb_cb being larger than the 48 bytes available in skb->cb.
> 
> As OVS does not support IPv6 as the outer transport protocol for tunnels
> the IPv6 portions of this change, which appeared in the previous revision,
> have been dropped in order to limit the scope and size of this patch.
> 
> This patch does not make any effort to retain the existing tun_id behaviour
> nor does it fully implement flow-based tunnels. As such it it is incomplete
> and can't be used in its current form (other than to break OVS tunnelling).
> 
> ** Please do not apply **
> 
> Cc: Kyle Mestery <kmest...@cisco.com>
> Signed-off-by: Simon Horman <ho...@verge.net.au>

Thanks and sorry again about being so slow to look at this.

Overall, this looks pretty good to me.  The main difficulty that I had was in 
figuring out what should go with the old behavior and what should go with the 
new since it's at an intermediate point between the two but I understand that 
it's difficult to break it up in a way that both encapsulates a particular set 
of functionality and isn't too large.  Otherwise, I noticed a few specific 
things that I noted below.

> diff --git a/datapath/flow.c b/datapath/flow.c
> index d07337c..49c0dd8 100644
> --- a/datapath/flow.c
> +++ b/datapath/flow.c
> @@ -1162,14 +1166,15 @@ int ovs_flow_from_nlattrs(struct sw_flow_key *swkey, 
> int *key_lenp,
>  * get the metadata, that is, the parts of the flow key that cannot be
>  * extracted from the packet itself.
>  */
> -int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port, __be64 
> *tun_id,
> +int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port,
> +                                struct ovs_key_ipv4_tunnel *tun_key,
>                                  const struct nlattr *attr)
> {
>       const struct nlattr *nla;
>       int rem;
> 
>       *in_port = DP_MAX_PORTS;
> -     *tun_id = 0;
> +     tun_key->tun_id = 0;

I think we probably want to memset the entire tun_key to zero to avoid having 
potentially uninitialized data in the flow.

> 
> @@ -1204,15 +1210,21 @@ int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 
> *in_port, __be64 *tun_id,
> int ovs_flow_to_nlattrs(const struct sw_flow_key *swkey, struct sk_buff *skb)
> {
>       struct ovs_key_ethernet *eth_key;
> +     struct ovs_key_ipv4_tunnel *tun_key;
>       struct nlattr *nla, *encap;
> 
>       if (swkey->phy.priority &&
>           nla_put_u32(skb, OVS_KEY_ATTR_PRIORITY, swkey->phy.priority))
>               goto nla_put_failure;
> 
> -     if (swkey->phy.tun_id != cpu_to_be64(0) &&
> -         nla_put_be64(skb, OVS_KEY_ATTR_TUN_ID, swkey->phy.tun_id))
> -             goto nla_put_failure;
> +     if (swkey->phy.tun_key.ipv4_dst) {

It's probably OK to use DIP equal to zero as a not present marker but we need 
to enforce that it's always true - for example we shouldn't allow somebody to 
setup a flow that way or receive packets with a zero address.  Alternately, we 
may be able to find a spare bit to indicate this, like is done with vlans.

In any case, I think we need to do some additional validation when setting up 
flows to check reserved space, for example, as otherwise that will never match.

> diff --git a/datapath/flow.h b/datapath/flow.h
> index 5be481e..bab5363 100644
> --- a/datapath/flow.h
> +++ b/datapath/flow.h
> @@ -42,7 +42,7 @@ struct sw_flow_actions {
> 
> struct sw_flow_key {
>       struct {
> -             __be64  tun_id;         /* Encapsulating tunnel ID. */
> +             struct ovs_key_ipv4_tunnel tun_key;  /* Encapsulating tunnel 
> key. */

This is an optimization but as we get closer I'd like to put the tun_key at the 
end of struct sw_flow_key so that packets that didn't come from a tunnel don't 
have to pay the cost during the lookup (this is especially true as we add 
support for IPv6 tunnels).

In a similar vein, struct ovs_key_ipv4_tunnel contains some fields that I think 
can never apply for lookup such as the flags so it would be nice if we could 
remove that for lookup.

> 
> @@ -150,6 +150,7 @@ u64 ovs_flow_used_time(unsigned long flow_jiffies);
>  *                         ------  ---  ------  -----
>  *  OVS_KEY_ATTR_PRIORITY      4    --     4      8
>  *  OVS_KEY_ATTR_TUN_ID        8    --     4     12
> + *  OVS_KEY_ATTR_IPV4_TUNNEL  18     2     4     24

If my math is correct, I think the size of the base struct ova_key_ipv4_tunnel 
is 24 bytes.

> +static inline void tun_key_swap_addr(struct ovs_key_ipv4_tunnel *tun_key)
> +{
> +     __be32 ndst = tun_key->ipv4_src;
> +     tun_key->ipv4_src = tun_key->ipv4_dst;
> +     tun_key->ipv4_dst = ndst;
> +}

I'm not quite sure when we would need to swap the addresses in a tunnel and I 
didn't see any uses of this function.

> +static inline void tun_key_init(struct ovs_key_ipv4_tunnel *tun_key,
> +                             const struct iphdr *iph, __be64 tun_id)
> +{
> +     tun_key->tun_id = tun_id;
> +     tun_key->ipv4_src = iph->saddr;
> +     tun_key->ipv4_dst = iph->daddr;
> +     tun_key->ipv4_tos = iph->tos;
> +     tun_key->ipv4_ttl = iph->ttl;
> +}
> 

Aren't there some fields that we need to zero out to avoid problems in the 
lookup?

> diff --git a/datapath/tunnel.c b/datapath/tunnel.c
> index d651c11..010e513 100644
> --- a/datapath/tunnel.c
> +++ b/datapath/tunnel.c
> @@ -367,9 +367,9 @@ struct vport *ovs_tnl_find_port(struct net *net, __be32 
> saddr, __be32 daddr,
>       return NULL;
> }
> 
> -static void ecn_decapsulate(struct sk_buff *skb, u8 tos)
> +static void ecn_decapsulate(struct sk_buff *skb)
> {
> -     if (unlikely(INET_ECN_is_ce(tos))) {
> +     if (unlikely(INET_ECN_is_ce(OVS_CB(skb)->tun_key->ipv4_tos))) {
>               __be16 protocol = skb->protocol;

This might come in a later patch, although I didn't see it in a quick scan, but 
it should be possible to implement all the ECN encapsulation and decapsulation 
in userspace, just like we can do with the rest of the ToS and TTL.

> 
> bool ovs_tnl_frag_needed(struct vport *vport,
>                        const struct tnl_mutable_config *mutable,
> -                      struct sk_buff *skb, unsigned int mtu, __be64 flow_key)
> +                      struct sk_buff *skb, unsigned int mtu,
> +                      struct ovs_key_ipv4_tunnel *tun_key)
> {
>       unsigned int eth_hdr_len = ETH_HLEN;
>       unsigned int total_length = 0, header_length = 0, payload_length;
>       struct ethhdr *eh, *old_eh = eth_hdr(skb);
>       struct sk_buff *nskb;
> +     struct ovs_key_ipv4_tunnel ntun_key;
> 
>       /* Sanity check */
>       if (skb->protocol == htons(ETH_P_IP)) {
> @@ -705,8 +707,10 @@ bool ovs_tnl_frag_needed(struct vport *vport,
>        * any way of synthesizing packets.
>        */
>       if ((mutable->flags & (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION)) ==
> -         (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION))
> -             OVS_CB(nskb)->tun_id = flow_key;
> +         (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION)) {
> +             ntun_key = *tun_key;
> +             OVS_CB(nskb)->tun_key = &ntun_key;
> +     }

I guess this is probably where you were going to use the function to reverse IP 
addresses.  The logic doesn't really work but it's moot since this is going 
away anyways.
> 
> @@ -799,10 +803,8 @@ static void create_tunnel_header(const struct vport 
> *vport,
>       iph->ihl        = sizeof(struct iphdr) >> 2;
>       iph->frag_off   = htons(IP_DF);
>       iph->protocol   = tnl_vport->tnl_ops->ipproto;
> -     iph->tos        = mutable->tos;
>       iph->daddr      = rt->rt_dst;
>       iph->saddr      = rt->rt_src;
> -     iph->ttl        = mutable->ttl;
>       if (!iph->ttl)
>               iph->ttl = ip4_dst_hoplimit(&rt_dst(rt));
> 

I'm not sure that these changes quite belong in this patch (not that it 
shouldn't be done but it seems like the supporting code isn't there yet).
> 
> diff --git a/datapath/vport-gre.c b/datapath/vport-gre.c
> index ab89c5b..fd2b038 100644
> --- a/datapath/vport-gre.c
> +++ b/datapath/vport-gre.c
> @@ -101,10 +101,6 @@ static struct sk_buff *gre_update_header(const struct 
> vport *vport,
>       __be32 *options = (__be32 *)(skb_network_header(skb) + 
> mutable->tunnel_hlen
>                                              - GRE_HEADER_SECTION);
> 
> -     /* Work backwards over the options so the checksum is last. */
> -     if (mutable->flags & TNL_F_OUT_KEY_ACTION)
> -             *options = be64_get_low32(OVS_CB(skb)->tun_id);

Why does this go away?

> diff --git a/datapath/vport.c b/datapath/vport.c
> index 172261a..0c77a1b 100644
> --- a/datapath/vport.c
> +++ b/datapath/vport.c
> @@ -462,7 +462,7 @@ void ovs_vport_receive(struct vport *vport, struct 
> sk_buff *skb)
>               OVS_CB(skb)->flow = NULL;
> 
>       if (!(vport->ops->flags & VPORT_F_TUN_ID))
> -             OVS_CB(skb)->tun_id = 0;
> +             OVS_CB(skb)->tun_key = NULL;

We probably should rename this flag now.

> diff --git a/lib/odp-util.h b/lib/odp-util.h
> index d53f083..4e5a8a1 100644
> --- a/lib/odp-util.h
> +++ b/lib/odp-util.h
> @@ -72,6 +72,7 @@ int odp_actions_from_string(const char *, const struct 
> simap *port_names,
>  *                         ------  ---  ------  -----
>  *  OVS_KEY_ATTR_PRIORITY      4    --     4      8
>  *  OVS_KEY_ATTR_TUN_ID        8    --     4     12
> + *  OVS_KEY_ATTR_IPV4_TUNNEL  18     2     4     24

Same thing about the size here as well.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to