On Sun, Jun 03, 2012 at 06:15:04PM +0900, Jesse Gross wrote: > On Thu, May 24, 2012 at 6:08 PM, Simon Horman <ho...@verge.net.au> wrote: > > this is a first pass at providing a tun_key which can be used > > as the basis for flow-based tunnelling. The tun_key includes and > > replaces the tun_id in both struct ovs_skb_cb and struct sw_tun_key. > > > > In ovs_skb_cb tun_key is a pointer as it is envisaged that it will grow > > when support for IPv6 to an extent that inlining the structure will result > > in ovs_skb_cb being larger than the 48 bytes available in skb->cb. > > > > As OVS does not support IPv6 as the outer transport protocol for tunnels > > the IPv6 portions of this change, which appeared in the previous revision, > > have been dropped in order to limit the scope and size of this patch. > > > > This patch does not make any effort to retain the existing tun_id behaviour > > nor does it fully implement flow-based tunnels. As such it it is incomplete > > and can't be used in its current form (other than to break OVS tunnelling). > > > > ** Please do not apply ** > > > > Cc: Kyle Mestery <kmest...@cisco.com> > > Signed-off-by: Simon Horman <ho...@verge.net.au> > > Thanks and sorry again about being so slow to look at this. > > Overall, this looks pretty good to me. The main difficulty that I had > was in figuring out what should go with the old behavior and what > should go with the new since it's at an intermediate point between the > two but I understand that it's difficult to break it up in a way that > both encapsulates a particular set of functionality and isn't too > large. Otherwise, I noticed a few specific things that I noted below. > > > diff --git a/datapath/flow.c b/datapath/flow.c > > index d07337c..49c0dd8 100644 > > --- a/datapath/flow.c > > +++ b/datapath/flow.c > > @@ -1162,14 +1166,15 @@ int ovs_flow_from_nlattrs(struct sw_flow_key > > *swkey, int *key_lenp, > > * get the metadata, that is, the parts of the flow key that cannot be > > * extracted from the packet itself. > > */ > > -int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port, __be64 > > *tun_id, > > +int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port, > > + struct ovs_key_ipv4_tunnel *tun_key, > > const struct nlattr *attr) > > { > > const struct nlattr *nla; > > int rem; > > > > *in_port = DP_MAX_PORTS; > > - *tun_id = 0; > > + tun_key->tun_id = 0; > > I think we probably want to memset the entire tun_key to zero to avoid > having potentially uninitialized data in the flow.
Sure, that is fine by me. > > @@ -1204,15 +1210,21 @@ int ovs_flow_metadata_from_nlattrs(u32 *priority, > > u16 *in_port, __be64 *tun_id, > > int ovs_flow_to_nlattrs(const struct sw_flow_key *swkey, struct sk_buff > > *skb) > > { > > struct ovs_key_ethernet *eth_key; > > + struct ovs_key_ipv4_tunnel *tun_key; > > struct nlattr *nla, *encap; > > > > if (swkey->phy.priority && > > nla_put_u32(skb, OVS_KEY_ATTR_PRIORITY, swkey->phy.priority)) > > goto nla_put_failure; > > > > - if (swkey->phy.tun_id != cpu_to_be64(0) && > > - nla_put_be64(skb, OVS_KEY_ATTR_TUN_ID, swkey->phy.tun_id)) > > - goto nla_put_failure; > > + if (swkey->phy.tun_key.ipv4_dst) { > > It's probably OK to use DIP equal to zero as a not present marker but > we need to enforce that it's always true - for example we shouldn't > allow somebody to setup a flow that way or receive packets with a zero > address. Alternately, we may be able to find a spare bit to indicate > this, like is done with vlans. When I originally wrote this there didn't seem to be any obvious place in ovs_key_ipv4_tunnel to have an active/inactive bit, which is in part why the code relies on checking DIP. However, more recent versions of ovs_key_ipv4_tunnel have a flags field of which only one bit is currently used. We could use one of the unused flag bits. > In any case, I think we need to do some additional validation when > setting up flows to check reserved space, for example, as otherwise > that will never match. Sure. My testing seems to indicate that matching does occur, though I am quite happy to tighten things up. > > > diff --git a/datapath/flow.h b/datapath/flow.h > > index 5be481e..bab5363 100644 > > --- a/datapath/flow.h > > +++ b/datapath/flow.h > > @@ -42,7 +42,7 @@ struct sw_flow_actions { > > > > struct sw_flow_key { > > struct { > > - __be64 tun_id; /* Encapsulating tunnel ID. */ > > + struct ovs_key_ipv4_tunnel tun_key; /* Encapsulating > > tunnel key. */ > > This is an optimization but as we get closer I'd like to put the > tun_key at the end of struct sw_flow_key so that packets that didn't > come from a tunnel don't have to pay the cost during the lookup (this > is especially true as we add support for IPv6 tunnels). Sure. > In a similar vein, struct ovs_key_ipv4_tunnel contains some fields > that I think can never apply for lookup such as the flags so it would > be nice if we could remove that for lookup. I think they need to be there to be passed around, so I'm not sure if they can easily be removed from ovs_key_ipv4_tunnel if that is what you are asking. > > @@ -150,6 +150,7 @@ u64 ovs_flow_used_time(unsigned long flow_jiffies); > > * ------ --- ------ ----- > > * OVS_KEY_ATTR_PRIORITY 4 -- 4 8 > > * OVS_KEY_ATTR_TUN_ID 8 -- 4 12 > > + * OVS_KEY_ATTR_IPV4_TUNNEL 18 2 4 24 > > If my math is correct, I think the size of the base struct > ova_key_ipv4_tunnel is 24 bytes. Sorry, the size changed a few times and I seem to have forgotten to update the above table accordingly. > > +static inline void tun_key_swap_addr(struct ovs_key_ipv4_tunnel *tun_key) > > +{ > > + __be32 ndst = tun_key->ipv4_src; > > + tun_key->ipv4_src = tun_key->ipv4_dst; > > + tun_key->ipv4_dst = ndst; > > +} > > I'm not quite sure when we would need to swap the addresses in a > tunnel and I didn't see any uses of this function. This should no longer be needed - and was always broken - I'll remove it. > > +static inline void tun_key_init(struct ovs_key_ipv4_tunnel *tun_key, > > + const struct iphdr *iph, __be64 tun_id) > > +{ > > + tun_key->tun_id = tun_id; > > + tun_key->ipv4_src = iph->saddr; > > + tun_key->ipv4_dst = iph->daddr; > > + tun_key->ipv4_tos = iph->tos; > > + tun_key->ipv4_ttl = iph->ttl; > > +} > > Aren't there some fields that we need to zero out to avoid problems in > the lookup? Thanks, I will check. > > diff --git a/datapath/tunnel.c b/datapath/tunnel.c > > index d651c11..010e513 100644 > > --- a/datapath/tunnel.c > > +++ b/datapath/tunnel.c > > @@ -367,9 +367,9 @@ struct vport *ovs_tnl_find_port(struct net *net, __be32 > > saddr, __be32 daddr, > > return NULL; > > } > > > > -static void ecn_decapsulate(struct sk_buff *skb, u8 tos) > > +static void ecn_decapsulate(struct sk_buff *skb) > > { > > - if (unlikely(INET_ECN_is_ce(tos))) { > > + if (unlikely(INET_ECN_is_ce(OVS_CB(skb)->tun_key->ipv4_tos))) { > > This might come in a later patch, although I didn't see it in a quick > scan, but it should be possible to implement all the ECN encapsulation > and decapsulation in userspace, just like we can do with the rest of > the ToS and TTL. I hadn't considered that, I will add it to my TODO list. > > bool ovs_tnl_frag_needed(struct vport *vport, > > const struct tnl_mutable_config *mutable, > > - struct sk_buff *skb, unsigned int mtu, __be64 > > flow_key) > > + struct sk_buff *skb, unsigned int mtu, > > + struct ovs_key_ipv4_tunnel *tun_key) > > { > > unsigned int eth_hdr_len = ETH_HLEN; > > unsigned int total_length = 0, header_length = 0, payload_length; > > struct ethhdr *eh, *old_eh = eth_hdr(skb); > > struct sk_buff *nskb; > > + struct ovs_key_ipv4_tunnel ntun_key; > > > > /* Sanity check */ > > if (skb->protocol == htons(ETH_P_IP)) { > > @@ -705,8 +707,10 @@ bool ovs_tnl_frag_needed(struct vport *vport, > > * any way of synthesizing packets. > > */ > > if ((mutable->flags & (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION)) == > > - (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION)) > > - OVS_CB(nskb)->tun_id = flow_key; > > + (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION)) { > > + ntun_key = *tun_key; > > + OVS_CB(nskb)->tun_key = &ntun_key; > > + } > > I guess this is probably where you were going to use the function to > reverse IP addresses. The logic doesn't really work but it's moot > since this is going away anyways. My latest series includes a clean up to ovs_tnl_frag_needed() to allow it to work in some circumstances - i.e. those found in my test environment. That series removes knowledge of tun_key from ovs_tnl_frag_needed(). I am however happy to remove ovs_tnl_frag_needed() completely if you think that is appropriate. > > @@ -799,10 +803,8 @@ static void create_tunnel_header(const struct vport > > *vport, > > iph->ihl = sizeof(struct iphdr) >> 2; > > iph->frag_off = htons(IP_DF); > > iph->protocol = tnl_vport->tnl_ops->ipproto; > > - iph->tos = mutable->tos; > > iph->daddr = rt->rt_dst; > > iph->saddr = rt->rt_src; > > - iph->ttl = mutable->ttl; > > if (!iph->ttl) > > iph->ttl = ip4_dst_hoplimit(&rt_dst(rt)); > > I'm not sure that these changes quite belong in this patch (not that > it shouldn't be done but it seems like the supporting code isn't there > yet). Sorry, I did some merging of my patches and this seems to have been a case where I merged incorrectly. I think that change belongs in: [PATCH 18/21] dataptah: remove ttl and tos from tnl_mutable_config (I need to fix the typo in the title of that patch!) > > diff --git a/datapath/vport-gre.c b/datapath/vport-gre.c > > index ab89c5b..fd2b038 100644 > > --- a/datapath/vport-gre.c > > +++ b/datapath/vport-gre.c > > @@ -101,10 +101,6 @@ static struct sk_buff *gre_update_header(const struct > > vport *vport, > > __be32 *options = (__be32 *)(skb_network_header(skb) + > > mutable->tunnel_hlen > > - GRE_HEADER_SECTION); > > > > - /* Work backwards over the options so the checksum is last. */ > > - if (mutable->flags & TNL_F_OUT_KEY_ACTION) > > - *options = be64_get_low32(OVS_CB(skb)->tun_id); > > Why does this go away? I agree that looks wrong and my only explanation is that it is remnants of some hack. I have done some, hopefully more sensible, re-working of gre_update_header in: [PATCH 20/21] datapath: Use tun_key flags for id and csum settings on transmit > > diff --git a/datapath/vport.c b/datapath/vport.c > > index 172261a..0c77a1b 100644 > > --- a/datapath/vport.c > > +++ b/datapath/vport.c > > @@ -462,7 +462,7 @@ void ovs_vport_receive(struct vport *vport, struct > > sk_buff *skb) > > OVS_CB(skb)->flow = NULL; > > > > if (!(vport->ops->flags & VPORT_F_TUN_ID)) > > - OVS_CB(skb)->tun_id = 0; > > + OVS_CB(skb)->tun_key = NULL; > > We probably should rename this flag now. Yes, I think there are several flags that may now be removed. I'll add that to my TODO list. > > diff --git a/lib/odp-util.h b/lib/odp-util.h > > index d53f083..4e5a8a1 100644 > > --- a/lib/odp-util.h > > +++ b/lib/odp-util.h > > @@ -72,6 +72,7 @@ int odp_actions_from_string(const char *, const struct > > simap *port_names, > > * ------ --- ------ ----- > > * OVS_KEY_ATTR_PRIORITY 4 -- 4 8 > > * OVS_KEY_ATTR_TUN_ID 8 -- 4 12 > > + * OVS_KEY_ATTR_IPV4_TUNNEL 18 2 4 24 > > Same thing about the size here as well. Thanks _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev