On Tue, May 01, 2012 at 04:50:49PM -0700, Jesse Gross wrote:
> On Mon, Apr 30, 2012 at 5:13 PM, Simon Horman <[email protected]> wrote:
> > this is a first pass at providing a tun_key which can be used
> > as the basis for flow-based tunnelling. The tun_key includes and
> > replaces the tun_id in both struct ovs_skb_cb and struct sw_tun_key.
> 
> Thanks, this looks like a good start to me.

Thanks, its nice to get off to a good start :)

> > A significant problem with the current code, which causes it to fail to
> > compile, is that struct ovs_key_ipv6_tunnel is itself 48 bytes and that
> > as a result ovs_skb_cb is larger than 48 bytes. This means that it will
> > not fit inside skb->cb which is also 48 bytes.
> 
> I think it's possible to avoid this problem by just storing a pointer
> to the appropriate tunnel data since it exists in memory already.  On
> the output side, rather than copying the data from the flow you could
> just have a pointer into the flow data directly.  The flow is
> guaranteed to stick around for the lifetime of the skb due to RCU.  On
> the receive side, it could just be a pointer to some data on the stack
> that contains the appropriate struct filled with information.

I have attempted to implement that scheme in the revised patch below.
I have not actually testsed the patch (partly because it breaks tunnelling
in its current form) and I do suspect I may have missed a few things
leading to invalid accesses.

> > At this point it may be just as well just to drop the IPv6 portions of this
> > change, as far as I know OVS has not supported IPv6 as the outer transport
> > protocol for tunnelled frames. However it does seem to be a problem that
> > will arise at some point.
> 
> It's good to think about IPv6 but since OVS doesn't currently support
> tunnels over IPv6 I wouldn't worry about it much.  I think the overall
> change will be big enough that we don't want to add extra work if we
> can avoid it.

Agreed. I think it was a worthwhile exrecise to add the IPv6 code at
the time I did as it highlighed the skb-sb size problem. But I agree
it is adding extra work at this point. I have dropped all the IPv6 code
for now.

> > a pointer and set skb->destructor() to free it as needed. I am, however,
> > concerned about the complexity and performance penalty this may introduce.
> > Moreover, I'd like some review on the merit of stuffing all the tun_key
> > information into skb->cb.
> 
> I agree that allocating and freeing the metadata for each packet is
> something that we probably want to avoid.

I'm glad I didn't code-up my idea :)

> > The patch does make some effort to retain the existing tun_id behaviour.
> > I imagine this is required for compatibility reasons.
> >
> > The patch makes no attempt to use tun_key other than compatibility
> > with the tun_id behaviour.
> 
> I actually wouldn't worry about kernel side compatibility here too
> much.  My general thought is that people should either use the OVS
> kernel module with the userspace of the same version or the upstream
> module with userspace >= 1.4.  Since none of the tunneling code is
> upstream yet we don't have to worry about compatibility and we'll have
> to break it anyways if we want to eventually get everything upstream
> and have it be clean.
> 
> The one place where we will need to maintain compatibility is from OVS
> userspace to the outside world.  However, I think that everything we
> currently do today can be implemented in terms of flow based tunneling
> in userspace.  Therefore, I would go ahead and start making the full
> changes around how tunnel lookup, for example, is done.

Understood. I have got as far as breaking the existing tun_id behaviour,
but not actually providing a replacement in the kernel. I also haven't got
as far as touching usespace in any meaningful way. But I agree that
userspace is a fine palce for the compatibility code.

> > diff --git a/datapath/flow.h b/datapath/flow.h
> > index 5261fa8..c39023a 100644
> > --- a/datapath/flow.h
> > +++ b/datapath/flow.h
> > +struct sw_tun_key {
> > +       __kernel_sa_family_t    tun_af;
> > +       union {
> > +               __be64                          tun_id;
> > +               struct ovs_key_ipv4_tunnel      tun_ipv4;
> > +               struct ovs_key_ipv6_tunnel      tun_ipv6;
> > +       };
> > +};
> 
> I think we probably want to pack this struct a little better at some
> point since tun_af actually only needs to be a single bit and will
> also have a fair amount of padding after it.

I agree there is some room for improvement. For now I have dropped this
structure alltogether in favour of using ovs_key_ipv4_tunnel directly as
the code currently only supports IPv4.

This turned out to simplify things a bit. But I wonder if the
code is no longer sufficiently generic.

> >  struct sw_flow_key {
> >        struct {
> > -               __be64  tun_id;         /* Encapsulating tunnel ID. */
> > -               u32     priority;       /* Packet QoS priority. */
> > -               u16     in_port;        /* Input switch port (or 
> > DP_MAX_PORTS). */
> > +               struct sw_tun_key       tun_key;        /* Encapsulating 
> > tunnel key. */
> 
> Eventually, at least, I'd like to move this to the end of the struct
> so that it doesn't affect performance for non-tunneled traffic.

Sure. Do you mean moving phy (which is what contains tun_key) to the end
of sw_flow_key. Or moving tun_key out of phy?

> > diff --git a/datapath/tunnel.c b/datapath/tunnel.c
> > index d406dbc..0305fbd 100644
> > --- a/datapath/tunnel.c
> > +++ b/datapath/tunnel.c
> > @@ -613,7 +613,8 @@ static void ipv6_build_icmp(struct sk_buff *skb, struct 
> > sk_buff *nskb,
> >
> >  bool ovs_tnl_frag_needed(struct vport *vport,
> >                         const struct tnl_mutable_config *mutable,
> > -                        struct sk_buff *skb, unsigned int mtu, __be64 
> > flow_key)
> > +                        struct sk_buff *skb, unsigned int mtu,
> > +                        struct sw_tun_key *tun_key)
> >  {
> >        unsigned int eth_hdr_len = ETH_HLEN;
> >        unsigned int total_length = 0, header_length = 0, payload_length;
> > @@ -706,7 +707,7 @@ bool ovs_tnl_frag_needed(struct vport *vport,
> >         */
> >        if ((mutable->flags & (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION)) ==
> >            (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION))
> > -               OVS_CB(nskb)->tun_id = flow_key;
> > +               OVS_CB(nskb)->tun_key = *tun_key;
> 
> I think the paradigm of echoing back the tunnel information on the
> PMTUD packet breaks down here.  At least in simple cases (but even
> here not all) keys are often symmetric between receive and transmit so
> it's possible to use the same key.  However, once you start including
> addresses, it's obviously no longer symmetric.  In retrospect, I think
> this implementation of PMTUD was a mistake since it's a hack that
> works in many circumstances but not all.  Eventually, it would
> probably be better to replace it with an implementation of MSS
> clamping in userspace (which also has the advantage of being
> implementable completely in userspace).

Understood.

For now I have hacked up a solution that swaps the saddr and daddr in the
tun_key. However, I am unsure if that is ever valid. If not, perhaps
MSS clamping needs to be implemented before switching to flow-based
tunneling?

> > diff --git a/include/linux/openvswitch.h b/include/linux/openvswitch.h
> > index 0578b5f..da32f7f 100644
> > --- a/include/linux/openvswitch.h
> > +++ b/include/linux/openvswitch.h
> > @@ -278,7 +278,9 @@ enum ovs_key_attr {
> >        OVS_KEY_ATTR_ICMPV6,    /* struct ovs_key_icmpv6 */
> >        OVS_KEY_ATTR_ARP,       /* struct ovs_key_arp */
> >        OVS_KEY_ATTR_ND,        /* struct ovs_key_nd */
> > -       OVS_KEY_ATTR_TUN_ID = 63, /* be64 tunnel ID */
> > +       OVS_KEY_ATTR_TUN_ID,    /* be64 tunnel ID */
> > +       OVS_KEY_ATTR_IPV4_TUNNEL,      /* struct ovs_key_ipv4_tunnel */
> > +       OVS_KEY_ATTR_IPV6_TUNNEL = 63, /* struct ovs_key_ipv6_tunnel */
> 
> The high valued enum value is just to separate the upstream types from
> non-upstream.  Since we plan to push this upstream, you can drop it
> (also shrink some data types).

I have droped the = 63, though I'm a little unsure if that is what you meant.

> > +struct ovs_key_ipv4_tunnel {
> > +       __be64 tun_id;
> > +       __be32 ipv4_src;
> > +       __be32 ipv4_dst;
> > +       __u8   ipv4_tos;
> > +       __u8   ipv4_ttl;
> > +       __u8   tun_proto;       /* One of TNL_T_PROTO_* */
> > +       __u8   reserved;
> > +};
> > +
> > +struct ovs_key_ipv6_tunnel {
> > +       __be64 tun_id;
> > +       __be32 ipv6_src[4];
> > +       __be32 ipv6_dst[4];
> > +       __be32 ipv6_label;      /* 20-bits in least-significant bits. */
> > +       __u8   ipv6_tclass;
> > +       __u8   ipv6_hlimit;
> > +       __u8   tun_proto;       /* One of TNL_T_PROTO_* */
> > +       __u8   reserved;
> > +};
> 
> I'm not sure that tun_proto is necessary here - it's implied by the
> port that is output to/received from.

Thanks. I have dropped both tun_proto and reserved. Is it necessary
to pad the structure to a multiple of 64 bytes (or any other value)?


Today is the last day before a four-day weekend in Japan, so I wanted
to get what I have out for you to see as I find your review particularly
valuable. I appologise that the patch is somewhat rough around the edges.


[RFC v2] datapath: tunnelling: Replace tun_id with tun_key

this is a first pass at providing a tun_key which can be used
as the basis for flow-based tunnelling. The tun_key includes and
replaces the tun_id in both struct ovs_skb_cb and struct sw_tun_key.

In ovs_skb_cb tun_key is a pointer as it is envisaged that it will grow
when support for IPv6 to an extent that inlining the structure will result
in ovs_skb_cb being larger than the 48 bytes available in skb->cb.

As OVS does not support IPv6 as the outer transport protocol for tunnels
the IPv6 portions of this change, which appeared in the previous revision,
have been dropped in order to limit the scope and size of this patch.

This patch does not make any effort to retain the existing tun_id behaviour
nor does it fully implement flow-based tunnels. As such it it is incomplete
and can't be used in its current form (other than to break OVS tunnelling).

** Please do not apply **

Cc: Kyle Mestery <[email protected]>
Signed-off-by: Simon Horman <[email protected]>

diff --git a/datapath/actions.c b/datapath/actions.c
index 2903801..2a58a0a 100644
--- a/datapath/actions.c
+++ b/datapath/actions.c
@@ -342,8 +342,8 @@ static int execute_set_action(struct sk_buff *skb,
                skb->priority = nla_get_u32(nested_attr);
                break;
 
-       case OVS_KEY_ATTR_TUN_ID:
-               OVS_CB(skb)->tun_id = nla_get_be64(nested_attr);
+       case OVS_KEY_ATTR_IPV4_TUNNEL:
+               OVS_CB(skb)->tun_key = &OVS_CB(skb)->flow->key.phy.tun_key;
                break;
 
        case OVS_KEY_ATTR_ETHERNET:
@@ -469,7 +469,7 @@ int ovs_execute_actions(struct datapath *dp, struct sk_buff 
*skb)
                goto out_loop;
        }
 
-       OVS_CB(skb)->tun_id = 0;
+       OVS_CB(skb)->tun_key = NULL;
        error = do_execute_actions(dp, skb, acts->actions,
                                         acts->actions_len, false);
 
diff --git a/datapath/datapath.c b/datapath/datapath.c
index 826dc89..6c6cf09 100644
--- a/datapath/datapath.c
+++ b/datapath/datapath.c
@@ -776,7 +776,7 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, 
struct genl_info *info)
 
        err = ovs_flow_metadata_from_nlattrs(&flow->key.phy.priority,
                                             &flow->key.phy.in_port,
-                                            &flow->key.phy.tun_id,
+                                            &flow->key.phy.tun_key,
                                             a[OVS_PACKET_ATTR_KEY]);
        if (err)
                goto err_flow_put;
diff --git a/datapath/datapath.h b/datapath/datapath.h
index 18c8598..6644d88 100644
--- a/datapath/datapath.h
+++ b/datapath/datapath.h
@@ -96,7 +96,7 @@ struct datapath {
 /**
  * struct ovs_skb_cb - OVS data in skb CB
  * @flow: The flow associated with this packet.  May be %NULL if no flow.
- * @tun_id: ID of the tunnel that encapsulated this packet.  It is 0 if the
+ * @tun_key: Key for the tunnel that encapsulated this packet.
  * @ip_summed: Consistently stores L4 checksumming status across different
  * kernel versions.
  * @csum_start: Stores the offset from which to start checksumming independent
@@ -107,7 +107,7 @@ struct datapath {
  */
 struct ovs_skb_cb {
        struct sw_flow          *flow;
-       __be64                  tun_id;
+       struct ovs_key_ipv4_tunnel  *tun_key;
 #ifdef NEED_CSUM_NORMALIZE
        enum csum_type          ip_summed;
        u16                     csum_start;
diff --git a/datapath/flow.c b/datapath/flow.c
index 9f93550..ad59003 100644
--- a/datapath/flow.c
+++ b/datapath/flow.c
@@ -629,7 +629,8 @@ int ovs_flow_extract(struct sk_buff *skb, u16 in_port, 
struct sw_flow_key *key,
        memset(key, 0, sizeof(*key));
 
        key->phy.priority = skb->priority;
-       key->phy.tun_id = OVS_CB(skb)->tun_id;
+       if (OVS_CB(skb)->tun_key)
+               key->phy.tun_key = *OVS_CB(skb)->tun_key;
        key->phy.in_port = in_port;
 
        skb_reset_mac_header(skb);
@@ -1022,9 +1023,11 @@ int ovs_flow_from_nlattrs(struct sw_flow_key *swkey, int 
*key_lenp,
                swkey->phy.in_port = DP_MAX_PORTS;
        }
 
-       if (attrs & (1ULL << OVS_KEY_ATTR_TUN_ID)) {
-               swkey->phy.tun_id = nla_get_be64(a[OVS_KEY_ATTR_TUN_ID]);
-               attrs &= ~(1ULL << OVS_KEY_ATTR_TUN_ID);
+       if (attrs & (1ULL << OVS_KEY_ATTR_IPV4_TUNNEL)) {
+               struct ovs_key_ipv4_tunnel *tun_key;
+               tun_key = nla_data(a[OVS_KEY_ATTR_IPV4_TUNNEL]);
+               swkey->phy.tun_key = *tun_key;
+               attrs &= ~(1ULL << OVS_KEY_ATTR_IPV4_TUNNEL);
        }
 
        /* Data attributes. */
@@ -1162,14 +1165,15 @@ int ovs_flow_from_nlattrs(struct sw_flow_key *swkey, 
int *key_lenp,
  * get the metadata, that is, the parts of the flow key that cannot be
  * extracted from the packet itself.
  */
-int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port, __be64 *tun_id,
+int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port,
+                                  struct ovs_key_ipv4_tunnel *tun_key,
                                   const struct nlattr *attr)
 {
        const struct nlattr *nla;
        int rem;
 
        *in_port = DP_MAX_PORTS;
-       *tun_id = 0;
+       tun_key->tun_id = 0;
        *priority = 0;
 
        nla_for_each_nested(nla, attr, rem) {
@@ -1184,8 +1188,9 @@ int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 
*in_port, __be64 *tun_id,
                                *priority = nla_get_u32(nla);
                                break;
 
-                       case OVS_KEY_ATTR_TUN_ID:
-                               *tun_id = nla_get_be64(nla);
+                       case OVS_KEY_ATTR_IPV4_TUNNEL:
+                               memcpy(tun_key, nla_data(nla),
+                                      sizeof(*tun_key));
                                break;
 
                        case OVS_KEY_ATTR_IN_PORT:
@@ -1204,15 +1209,18 @@ int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 
*in_port, __be64 *tun_id,
 int ovs_flow_to_nlattrs(const struct sw_flow_key *swkey, struct sk_buff *skb)
 {
        struct ovs_key_ethernet *eth_key;
+       struct ovs_key_ipv4_tunnel *key;
        struct nlattr *nla, *encap;
 
        if (swkey->phy.priority &&
            nla_put_u32(skb, OVS_KEY_ATTR_PRIORITY, swkey->phy.priority))
                goto nla_put_failure;
 
-       if (swkey->phy.tun_id != cpu_to_be64(0) &&
-           nla_put_be64(skb, OVS_KEY_ATTR_TUN_ID, swkey->phy.tun_id))
+       nla = nla_reserve(skb, OVS_KEY_ATTR_IPV4_TUNNEL, sizeof(*key));
+       if (!nla)
                goto nla_put_failure;
+       key = nla_data(nla);
+       *key = swkey->phy.tun_key;
 
        if (swkey->phy.in_port != DP_MAX_PORTS &&
            nla_put_u32(skb, OVS_KEY_ATTR_IN_PORT, swkey->phy.in_port))
diff --git a/datapath/flow.h b/datapath/flow.h
index 5261fa8..a774ee8 100644
--- a/datapath/flow.h
+++ b/datapath/flow.h
@@ -42,7 +42,7 @@ struct sw_flow_actions {
 
 struct sw_flow_key {
        struct {
-               __be64  tun_id;         /* Encapsulating tunnel ID. */
+               struct ovs_key_ipv4_tunnel tun_key;  /* Encapsulating tunnel 
key. */
                u32     priority;       /* Packet QoS priority. */
                u16     in_port;        /* Input switch port (or DP_MAX_PORTS). 
*/
        } phy;
@@ -165,7 +165,8 @@ u64 ovs_flow_used_time(unsigned long flow_jiffies);
 int ovs_flow_to_nlattrs(const struct sw_flow_key *, struct sk_buff *);
 int ovs_flow_from_nlattrs(struct sw_flow_key *swkey, int *key_lenp,
                      const struct nlattr *);
-int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port, __be64 *tun_id,
+int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port,
+                                  struct ovs_key_ipv4_tunnel *tun_key,
                                   const struct nlattr *);
 
 #define MAX_ACTIONS_BUFSIZE    (16 * 1024)
diff --git a/datapath/tunnel.c b/datapath/tunnel.c
index d406dbc..d76db0a 100644
--- a/datapath/tunnel.c
+++ b/datapath/tunnel.c
@@ -613,12 +613,14 @@ static void ipv6_build_icmp(struct sk_buff *skb, struct 
sk_buff *nskb,
 
 bool ovs_tnl_frag_needed(struct vport *vport,
                         const struct tnl_mutable_config *mutable,
-                        struct sk_buff *skb, unsigned int mtu, __be64 flow_key)
+                        struct sk_buff *skb, unsigned int mtu,
+                        struct ovs_key_ipv4_tunnel *tun_key)
 {
        unsigned int eth_hdr_len = ETH_HLEN;
        unsigned int total_length = 0, header_length = 0, payload_length;
        struct ethhdr *eh, *old_eh = eth_hdr(skb);
        struct sk_buff *nskb;
+       struct ovs_key_ipv4_tunnel ntun_key;
 
        /* Sanity check */
        if (skb->protocol == htons(ETH_P_IP)) {
@@ -705,8 +707,11 @@ bool ovs_tnl_frag_needed(struct vport *vport,
         * any way of synthesizing packets.
         */
        if ((mutable->flags & (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION)) ==
-           (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION))
-               OVS_CB(nskb)->tun_id = flow_key;
+           (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION)) {
+               ntun_key = *tun_key;
+               ipv4_tunnel_key_swap_addr(&ntun_key);
+               OVS_CB(nskb)->tun_key = &ntun_key;
+       }
 
        if (unlikely(compute_ip_summed(nskb, false))) {
                kfree_skb(nskb);
@@ -761,7 +766,7 @@ static bool check_mtu(struct sk_buff *skb,
 
                        if (packet_length > mtu &&
                            ovs_tnl_frag_needed(vport, mutable, skb, mtu,
-                                               OVS_CB(skb)->tun_id))
+                                               OVS_CB(skb)->tun_key))
                                return false;
                }
        }
@@ -778,7 +783,7 @@ static bool check_mtu(struct sk_buff *skb,
 
                        if (packet_length > mtu &&
                            ovs_tnl_frag_needed(vport, mutable, skb, mtu,
-                                               OVS_CB(skb)->tun_id))
+                                               OVS_CB(skb)->tun_key))
                                return false;
                }
        }
diff --git a/datapath/tunnel.h b/datapath/tunnel.h
index 33eb63c..918e0ad 100644
--- a/datapath/tunnel.h
+++ b/datapath/tunnel.h
@@ -276,7 +276,8 @@ struct vport *ovs_tnl_find_port(struct net *net, __be32 
saddr, __be32 daddr,
                                const struct tnl_mutable_config **mutable);
 bool ovs_tnl_frag_needed(struct vport *vport,
                         const struct tnl_mutable_config *mutable,
-                        struct sk_buff *skb, unsigned int mtu, __be64 
flow_key);
+                        struct sk_buff *skb, unsigned int mtu,
+                        struct ovs_key_ipv4_tunnel *tun_key);
 void ovs_tnl_free_linked_skbs(struct sk_buff *skb);
 
 int ovs_tnl_init(void);
diff --git a/datapath/vport-capwap.c b/datapath/vport-capwap.c
index e5b7afb..d8be125 100644
--- a/datapath/vport-capwap.c
+++ b/datapath/vport-capwap.c
@@ -220,7 +220,7 @@ static struct sk_buff *capwap_update_header(const struct 
vport *vport,
                struct capwaphdr_wsi *wsi = (struct capwaphdr_wsi *)(cwh + 1);
                struct capwaphdr_wsi_key *opt = (struct capwaphdr_wsi_key 
*)(wsi + 1);
 
-               opt->key = OVS_CB(skb)->tun_id;
+               opt->key = OVS_CB(skb)->tun_key->tun_id;
        }
 
        udph->len = htons(skb->len - skb_transport_offset(skb));
@@ -316,6 +316,7 @@ static int capwap_rcv(struct sock *sk, struct sk_buff *skb)
        struct vport *vport;
        const struct tnl_mutable_config *mutable;
        struct iphdr *iph;
+       struct ovs_key_ipv4_tunnel tun_key;
        __be64 key = 0;
 
        if (unlikely(!pskb_may_pull(skb, CAPWAP_MIN_HLEN + ETH_HLEN)))
@@ -333,10 +334,9 @@ static int capwap_rcv(struct sock *sk, struct sk_buff *skb)
                goto error;
        }
 
-       if (mutable->flags & TNL_F_IN_KEY_MATCH)
-               OVS_CB(skb)->tun_id = key;
-       else
-               OVS_CB(skb)->tun_id = 0;
+       ovs_key_ipv4_tunnel(&tun_key, iph,
+                           mutable->flags & TNL_F_IN_KEY_MATCH ? key : 0);
+       OVS_CB(skb)->tun_key = &tun_key;
 
        ovs_tnl_rcv(vport, skb, iph->tos);
        goto out;
diff --git a/datapath/vport-gre.c b/datapath/vport-gre.c
index 3bb55f0..94f4e0d 100644
--- a/datapath/vport-gre.c
+++ b/datapath/vport-gre.c
@@ -101,10 +101,6 @@ static struct sk_buff *gre_update_header(const struct 
vport *vport,
        __be32 *options = (__be32 *)(skb_network_header(skb) + 
mutable->tunnel_hlen
                                               - GRE_HEADER_SECTION);
 
-       /* Work backwards over the options so the checksum is last. */
-       if (mutable->flags & TNL_F_OUT_KEY_ACTION)
-               *options = be64_get_low32(OVS_CB(skb)->tun_id);
-
        if (mutable->out_key || mutable->flags & TNL_F_OUT_KEY_ACTION)
                options--;
 
@@ -285,7 +281,16 @@ static void gre_err(struct sk_buff *skb, u32 info)
 #endif
 
        __skb_pull(skb, tunnel_hdr_len);
-       ovs_tnl_frag_needed(vport, mutable, skb, mtu, key);
+       {
+               struct ovs_key_ipv4_tunnel tun_key = {
+                       .tun_id = key,
+                       .ipv4_src = iph->saddr,
+                       .ipv4_dst = iph->daddr,
+                       .ipv4_tos = iph->tos,
+                       .ipv4_ttl = iph->ttl,
+               };
+               ovs_tnl_frag_needed(vport, mutable, skb, mtu, &tun_key);
+       }
        __skb_push(skb, tunnel_hdr_len);
 
 out:
@@ -327,6 +332,7 @@ static int gre_rcv(struct sk_buff *skb)
        const struct tnl_mutable_config *mutable;
        int hdr_len;
        struct iphdr *iph;
+       struct ovs_key_ipv4_tunnel tun_key;
        __be16 flags;
        __be64 key;
 
@@ -351,10 +357,10 @@ static int gre_rcv(struct sk_buff *skb)
                goto error;
        }
 
-       if (mutable->flags & TNL_F_IN_KEY_MATCH)
-               OVS_CB(skb)->tun_id = key;
-       else
-               OVS_CB(skb)->tun_id = 0;
+
+       ovs_key_ipv4_tunnel(&tun_key, iph,
+                           mutable->flags & TNL_F_IN_KEY_MATCH ? key : 0);
+       OVS_CB(skb)->tun_key = &tun_key;
 
        __skb_pull(skb, hdr_len);
        skb_postpull_rcsum(skb, skb_transport_header(skb), hdr_len + ETH_HLEN);
diff --git a/datapath/vport.c b/datapath/vport.c
index b75a866..20c68c5 100644
--- a/datapath/vport.c
+++ b/datapath/vport.c
@@ -462,7 +462,7 @@ void ovs_vport_receive(struct vport *vport, struct sk_buff 
*skb)
                OVS_CB(skb)->flow = NULL;
 
        if (!(vport->ops->flags & VPORT_F_TUN_ID))
-               OVS_CB(skb)->tun_id = 0;
+               OVS_CB(skb)->tun_key = NULL;
 
        ovs_dp_process_received_packet(vport, skb);
 }
diff --git a/include/linux/openvswitch.h b/include/linux/openvswitch.h
index 0578b5f..db6d4e4 100644
--- a/include/linux/openvswitch.h
+++ b/include/linux/openvswitch.h
@@ -278,7 +278,8 @@ enum ovs_key_attr {
        OVS_KEY_ATTR_ICMPV6,    /* struct ovs_key_icmpv6 */
        OVS_KEY_ATTR_ARP,       /* struct ovs_key_arp */
        OVS_KEY_ATTR_ND,        /* struct ovs_key_nd */
-       OVS_KEY_ATTR_TUN_ID = 63, /* be64 tunnel ID */
+       OVS_KEY_ATTR_TUN_ID,    /* be64 tunnel ID */
+       OVS_KEY_ATTR_IPV4_TUNNEL,  /* struct ovs_key_ipv4_tunnel */
        __OVS_KEY_ATTR_MAX
 };
 
@@ -360,6 +361,22 @@ struct ovs_key_nd {
        __u8  nd_tll[6];
 };
 
+struct ovs_key_ipv4_tunnel {
+       __be64 tun_id;
+       __be32 ipv4_src;
+       __be32 ipv4_dst;
+       __u8   ipv4_tos;
+       __u8   ipv4_ttl;
+};
+
+static inline void
+ipv4_tunnel_key_swap_addr(struct ovs_key_ipv4_tunnel *tun_key)
+{
+       __be32 ndst = tun_key->ipv4_src;
+       tun_key->ipv4_src = tun_key->ipv4_dst;
+       tun_key->ipv4_dst = ndst;
+}
+
 /**
  * enum ovs_flow_attr - attributes for %OVS_FLOW_* commands.
  * @OVS_FLOW_ATTR_KEY: Nested %OVS_KEY_ATTR_* attributes specifying the flow
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index a33fe23..ea5bf17 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -1154,6 +1154,7 @@ execute_set_action(struct ofpbuf *packet, const struct 
nlattr *a)
     case OVS_KEY_ATTR_TUN_ID:
     case OVS_KEY_ATTR_PRIORITY:
     case OVS_KEY_ATTR_IPV6:
+    case OVS_KEY_ATTR_IPV4_TUNNEL:
         /* not implemented */
         break;
 
diff --git a/lib/odp-util.c b/lib/odp-util.c
index 8fa3359..65aa938 100644
--- a/lib/odp-util.c
+++ b/lib/odp-util.c
@@ -107,6 +107,7 @@ ovs_key_attr_to_string(enum ovs_key_attr attr)
     case OVS_KEY_ATTR_ARP: return "arp";
     case OVS_KEY_ATTR_ND: return "nd";
     case OVS_KEY_ATTR_TUN_ID: return "tun_id";
+    case OVS_KEY_ATTR_IPV4_TUNNEL: return "ipv4_tunnel";
 
     case __OVS_KEY_ATTR_MAX:
     default:
@@ -523,6 +524,7 @@ odp_flow_key_attr_len(uint16_t type)
     case OVS_KEY_ATTR_ICMPV6: return sizeof(struct ovs_key_icmpv6);
     case OVS_KEY_ATTR_ARP: return sizeof(struct ovs_key_arp);
     case OVS_KEY_ATTR_ND: return sizeof(struct ovs_key_nd);
+    case OVS_KEY_ATTR_IPV4_TUNNEL: return sizeof(struct ovs_key_ipv4_tunnel);
 
     case OVS_KEY_ATTR_UNSPEC:
     case __OVS_KEY_ATTR_MAX:
@@ -577,6 +579,7 @@ format_odp_key_attr(const struct nlattr *a, struct ds *ds)
     const struct ovs_key_icmpv6 *icmpv6_key;
     const struct ovs_key_arp *arp_key;
     const struct ovs_key_nd *nd_key;
+    const struct ovs_key_ipv4_tunnel *ipv4_tun_key;
     enum ovs_key_attr attr = nl_attr_type(a);
     int expected_len;
 
@@ -607,6 +610,15 @@ format_odp_key_attr(const struct nlattr *a, struct ds *ds)
         ds_put_format(ds, "(%#"PRIx64")", ntohll(nl_attr_get_be64(a)));
         break;
 
+    case OVS_KEY_ATTR_IPV4_TUNNEL:
+        ipv4_tun_key = nl_attr_get(a);
+        ds_put_format(ds, "(src="IP_FMT",dst="IP_FMT
+                      ",tos=%#"PRIx8",ttl=%"PRIu8")",
+                      IP_ARGS(&ipv4_tun_key->ipv4_src),
+                      IP_ARGS(&ipv4_tun_key->ipv4_dst),
+                      ipv4_tun_key->ipv4_tos, ipv4_tun_key->ipv4_ttl);
+        break;
+
     case OVS_KEY_ATTR_IN_PORT:
         ds_put_format(ds, "(%"PRIu32")", nl_attr_get_u32(a));
         break;

_______________________________________________
dev mailing list
[email protected]
http://openvswitch.org/mailman/listinfo/dev

Reply via email to