On Sep 3, 2012, at 3:43 PM, Jesse Gross wrote:
> On Wed, Aug 29, 2012 at 7:00 AM, Kyle Mestery <[email protected]> wrote:
>> This is a first pass at providing a tun_key which can be
>> used as the basis for flow-based tunnelling. The tun_key
>> includes and replaces the tun_id in both struct ovs_skb_cb
>> and struct sw_tun_key.
>>
>> In ovs_skb_cb tun_key is a pointer as it is envisaged that it will grow
>> when support for IPv6 to an extent that inlining the structure will result
>> in ovs_skb_cb being larger than the 48 bytes available in skb->cb.
>>
>> As OVS does not support IPv6 as the outer transport protocol for tunnels
>> the IPv6 portions of this change, which appeared in the previous revision,
>> have been dropped in order to limit the scope and size of this patch.
>>
>> This patch allows all existing tun_id behaviour to still work. However,
>> when the userspace code is updated to make use of the new tun_key, the
>> old behaviour will be deprecated and removed.
>>
>> Signed-off-by: Kyle Mestery <[email protected]>
>> Cc: Simon Horman <[email protected]>
>
> Thanks and sorry about the long delay.
>
No worries. Please see my comments below for each of yours. I'll send out a new
patch today, with the exception of the changes for moving the tun_key value
for matching optimizations, today.
> The commit message seems to be a little bit more of a historical
> narrative than a description of what's going on in this particular
> patch. Since I'd like to apply it on its own, can you make it
> describe the rationale a little more clearly?
>
> sparse flagged a couple of errors that look one of the calls to
> __find_route() in tunnel.c wasn't updated:
>
> /home/jesse/openvswitch/datapath/linux/tunnel.c:1449:69: warning:
> incorrect type in argument 3 (different base types)
> /home/jesse/openvswitch/datapath/linux/tunnel.c:1449:69: expected
> restricted __be32 [usertype] daddr
> /home/jesse/openvswitch/datapath/linux/tunnel.c:1449:69: got
> unsigned char [unsigned] [usertype] tos
> /home/jesse/openvswitch/datapath/linux/tunnel.c:1450:67: warning:
> incorrect type in argument 5 (different base types)
> /home/jesse/openvswitch/datapath/linux/tunnel.c:1450:67: expected
> unsigned char [unsigned] [usertype] tos
> /home/jesse/openvswitch/datapath/linux/tunnel.c:1450:67: got
> restricted __be32 [usertype] saddr
>
Fixed.
>> diff --git a/datapath/actions.c b/datapath/actions.c
>> index 208f260..83382b9 100644
>> --- a/datapath/actions.c
>> +++ b/datapath/actions.c
>> @@ -343,7 +343,11 @@ static int execute_set_action(struct sk_buff *skb,
>> break;
>>
>> case OVS_KEY_ATTR_TUN_ID:
>> - OVS_CB(skb)->tun_id = nla_get_be64(nested_attr);
>> + OVS_CB(skb)->tun_key->tun_id = nla_get_be64(nested_attr);
>
> I think this doesn't quite work out so simply as a compatibility layer
> because tun_key is either NULL or pointing directly into the actions
> of the flow, which are supposed to be immutable. Long term this won't
> be a problem but in the meantime we'll have a figure out a way to fake
> up an appropriate tun_key.
>
So, I think what we should do here is allocate OVS_CB(skb)->tun_key in
this case. We could modify struct ovs_key_ipv4_tunnel to have a flag value
indicating if it was allocated or not, to allow for correct handling at the time
of termination. Thoughts?
>> diff --git a/datapath/datapath.h b/datapath/datapath.h
>> index affbf0e..5e8bff1 100644
>> --- a/datapath/datapath.h
>> +++ b/datapath/datapath.h
>> @@ -96,7 +96,7 @@ struct datapath {
>> /**
>> * struct ovs_skb_cb - OVS data in skb CB
>> * @flow: The flow associated with this packet. May be %NULL if no flow.
>> - * @tun_id: ID of the tunnel that encapsulated this packet. It is 0 if the
>> + * @tun_key: Key for the tunnel that encapsulated this packet.
>
> Can you mention that this is NULL if the packet is not being tunneled?
>
Fixed.
>> diff --git a/datapath/flow.c b/datapath/flow.c
>> index d07337c..a68195d 100644
>> --- a/datapath/flow.c
>> +++ b/datapath/flow.c
>> @@ -1023,10 +1025,17 @@ int ovs_flow_from_nlattrs(struct sw_flow_key *swkey,
>> int *key_lenp,
>> }
>>
>> if (attrs & (1ULL << OVS_KEY_ATTR_TUN_ID)) {
>> - swkey->phy.tun_id = nla_get_be64(a[OVS_KEY_ATTR_TUN_ID]);
>> + swkey->phy.tun_key.tun_id =
>> nla_get_be64(a[OVS_KEY_ATTR_TUN_ID]);
>> attrs &= ~(1ULL << OVS_KEY_ATTR_TUN_ID);
>> }
>>
>> + if (attrs & (1ULL << OVS_KEY_ATTR_IPV4_TUNNEL)) {
>> + struct ovs_key_ipv4_tunnel *tun_key;
>> + tun_key = nla_data(a[OVS_KEY_ATTR_IPV4_TUNNEL]);
>> + swkey->phy.tun_key = *tun_key;
>> + attrs &= ~(1ULL << OVS_KEY_ATTR_IPV4_TUNNEL);
>> + }
>
> Although we should accept both forms, I think if both are present we
> should ensure that they are consistent by checking that tun_id
> matches.
>
Fixed.
>> @@ -1162,14 +1171,15 @@ int ovs_flow_from_nlattrs(struct sw_flow_key *swkey,
>> int *key_lenp,
>> * get the metadata, that is, the parts of the flow key that cannot be
>> * extracted from the packet itself.
>> */
>> -int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port, __be64
>> *tun_id,
>> +int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port,
>> + struct ovs_key_ipv4_tunnel *tun_key,
>> const struct nlattr *attr)
>> {
>> const struct nlattr *nla;
>> int rem;
>>
>> *in_port = DP_MAX_PORTS;
>> - *tun_id = 0;
>> + tun_key->tun_id = 0;
>
> I think we probably should memset the whole tun_key to ensure that
> there isn't any uninitialized data.
>
Fixed.
>> int ovs_flow_to_nlattrs(const struct sw_flow_key *swkey, struct sk_buff *skb)
>> {
>> struct ovs_key_ethernet *eth_key;
>> + struct ovs_key_ipv4_tunnel *tun_key;
>
> Can you declare this in block that checks for tun_key.ipv4_dst just to
> keep everything close together?
>
Fixed.
>> + if (swkey->phy.tun_key.tun_id != cpu_to_be64(0) &&
>> + nla_put_be64(skb, OVS_KEY_ATTR_TUN_ID,
>> swkey->phy.tun_key.tun_id))
>
> Can you line up the second line with spaces to swkey->... make it a
> little clearer which part it belongs to prevent line wrapping?
>
Fixed.
>> diff --git a/datapath/flow.h b/datapath/flow.h
>> index 5be481e..bab5363 100644
>> --- a/datapath/flow.h
>> +++ b/datapath/flow.h
>> struct sw_flow_key {
>> struct {
>> - __be64 tun_id; /* Encapsulating tunnel ID. */
>> + struct ovs_key_ipv4_tunnel tun_key; /* Encapsulating tunnel
>> key. */
>
> One thing that I think we should look into is how we can both shrink
> this fields down for the purposes of matching (for example, I believe
> the flags fields is always zero here) and move it to the end of the
> struct (so that non-tunneled packets don't have to process it at all).
> We've seen that hashing and comparison are the most expensive
> operations in the OVS datapath so we should try to avoid introducing
> extra bytes in places where they aren't needed. Obviously, this
> becomes even more important with IPv6 tunneling. Normally I wouldn't
> worry about optimization of a new feature until the end but since I'd
> like to start putting in patches as they are ready it's important that
> we don't introduce regressions for existing behavior.
>
So, as a start, move it out of the "struct phy" here into something at the end
of "struct sw_flow_key"?
>> @@ -150,6 +150,7 @@ u64 ovs_flow_used_time(unsigned long flow_jiffies);
>> * ------ --- ------ -----
>> * OVS_KEY_ATTR_PRIORITY 4 -- 4 8
>> * OVS_KEY_ATTR_TUN_ID 8 -- 4 12
>> + * OVS_KEY_ATTR_IPV4_TUNNEL 18 2 4 24
>
> I think this value might be out of date: I get 24 bytes as the base
> size of the struct. Also, since the two bytes of padding are
> explicitly part of the struct and not added by Netlink, I would
> include them in the struct column rather than the pad column.
>
Fixed.
>> +static inline void tun_key_swap_addr(struct ovs_key_ipv4_tunnel *tun_key)
>> +{
>> + __be32 ndst = tun_key->ipv4_src;
>> + tun_key->ipv4_src = tun_key->ipv4_dst;
>> + tun_key->ipv4_dst = ndst;
>> +}
>
> I don't think this function is actually used anywhere.
>
Removed. I checked, and later patches did not use it either.
>> +static inline void tun_key_init(struct ovs_key_ipv4_tunnel *tun_key,
>> + const struct iphdr *iph, __be64 tun_id)
>> +{
>> + tun_key->tun_id = tun_id;
>> + tun_key->ipv4_src = iph->saddr;
>> + tun_key->ipv4_dst = iph->daddr;
>> + tun_key->ipv4_tos = iph->tos;
>> + tun_key->ipv4_ttl = iph->ttl;
>> +}
>
> I'd be inclined to move this to tunnel.h since it's only used by
> tunnel implementations.
>
Moved.
>> diff --git a/datapath/tunnel.c b/datapath/tunnel.c
>> index d651c11..259668c 100644
>> --- a/datapath/tunnel.c
>> +++ b/datapath/tunnel.c
>> bool ovs_tnl_frag_needed(struct vport *vport,
> [...]
>> - /*
>> - * Assume that flow based keys are symmetric with respect to input
>> - * and output and use the key that we were going to put on the
>> - * outgoing packet for the fake received packet. If the keys are
>> - * not symmetric then PMTUD needs to be disabled since we won't have
>> - * any way of synthesizing packets.
>> - */
>> - if ((mutable->flags & (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION)) ==
>> - (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION))
>> - OVS_CB(nskb)->tun_id = flow_key;
>
> I think this is the one externally visibly feature that we're dropping
> at this point. Can you make a note of it in the commit message and
> NEWS?
>
Updated, please review.
>> @@ -799,10 +786,8 @@ static void create_tunnel_header(const struct vport
>> *vport,
>> iph->ihl = sizeof(struct iphdr) >> 2;
>> iph->frag_off = htons(IP_DF);
>> iph->protocol = tnl_vport->tnl_ops->ipproto;
>> - iph->tos = mutable->tos;
>> iph->daddr = rt->rt_dst;
>> iph->saddr = rt->rt_src;
>> - iph->ttl = mutable->ttl;
>> if (!iph->ttl)
>> iph->ttl = ip4_dst_hoplimit(&rt_dst(rt));
>
> I think these fields don't get used at this point but I'm not sure
> that the change is really related to the patch (and it also uses
> uninitialized memory).
>
Added back in.
>> static struct rtable *__find_route(const struct tnl_mutable_config *mutable,
>> - u8 ipproto, u8 tos)
>> + u8 ipproto, __be32 daddr, __be32 saddr,
>> + u8 tos)
>
> I find this order of arguments somewhat confusing (and judging by the
> fact that one place calls it wrong, I think you must too). How about
> saddr, daddr, tos, ipproto?
>
Fixed __find_route() and it's callers to include the order. I agree, it makes
more sense with the ordering you suggest here.
>> @@ -1037,14 +1024,16 @@ static struct rtable *find_route(struct vport *vport,
>> *cache = NULL;
>> tos = RT_TOS(tos);
>>
>> - if (likely(tos == RT_TOS(mutable->tos) &&
>> - check_cache_valid(cur_cache, mutable))) {
>> + if (daddr == mutable->key.daddr && saddr == mutable->key.saddr &&
>> + tos == RT_TOS(mutable->tos) &&
>
> I think we need to replicate this check down below where we build the
> cache - otherwise we could cache addresses from a specific flow that
> don't apply to this port in general.
>
Replicated.
>> @@ -1219,11 +1210,21 @@ int ovs_tnl_send(struct vport *vport, struct sk_buff
>> *skb)
>>
>> if (mutable->flags & TNL_F_TOS_INHERIT)
>> tos = inner_tos;
>> + else if (OVS_CB(skb)->tun_key)
>> + tos = OVS_CB(skb)->tun_key->ipv4_tos;
>> else
>> tos = mutable->tos;
>>
>> + if (OVS_CB(skb)->tun_key) {
>> + daddr = OVS_CB(skb)->tun_key->ipv4_dst;
>> + saddr = OVS_CB(skb)->tun_key->ipv4_src;
>> + } else {
>> + daddr = mutable->key.daddr;
>> + saddr = mutable->key.saddr;
>> + }
> [...]
>> /* TTL */
>> - ttl = mutable->ttl;
>> + if (OVS_CB(skb)->tun_key)
>> + ttl = OVS_CB(skb)->tun_key->ipv4_ttl;
>> + else
>> + ttl = mutable->ttl;
>> if (!ttl)
>> ttl = ip4_dst_hoplimit(&rt_dst(rt));
>
> I think in all of these cases we want to use a check for ipv4_dst != 0
> instead of tun_key being non-NULL since we're planning on using
> tun_key even if only tun_id was initialized.
>
Updated.
> That being said, if the full tun_key is set, that should probably
> override all of the other settings. When it is used userspace can
> directly initialize all of the values without any kernel involvement
> and that the direction that we want to go in. If userspace is using
> this action, we can assume that it accepts this new model. This also
> includes ECN encapsulation, since userspace can do that as well.
>
Agree, I think it's covered now.
>> diff --git a/datapath/vport-gre.c b/datapath/vport-gre.c
>> index ab89c5b..90041af 100644
>> --- a/datapath/vport-gre.c
>> +++ b/datapath/vport-gre.c
>> @@ -101,10 +101,6 @@ static struct sk_buff *gre_update_header(const struct
>> vport *vport,
>> __be32 *options = (__be32 *)(skb_network_header(skb) +
>> mutable->tunnel_hlen
>> - GRE_HEADER_SECTION);
>>
>> - /* Work backwards over the options so the checksum is last. */
>> - if (mutable->flags & TNL_F_OUT_KEY_ACTION)
>> - *options = be64_get_low32(OVS_CB(skb)->tun_id);
>
> Hmm, it doesn't seem like we should be removing this.
>
Added back in.
>> diff --git a/include/linux/openvswitch.h b/include/linux/openvswitch.h
>> index f5c9cca..c32bb58 100644
>> --- a/include/linux/openvswitch.h
>> +++ b/include/linux/openvswitch.h
>> @@ -278,7 +278,8 @@ enum ovs_key_attr {
>> OVS_KEY_ATTR_ICMPV6, /* struct ovs_key_icmpv6 */
>> OVS_KEY_ATTR_ARP, /* struct ovs_key_arp */
>> OVS_KEY_ATTR_ND, /* struct ovs_key_nd */
>> - OVS_KEY_ATTR_TUN_ID = 63, /* be64 tunnel ID */
>> + OVS_KEY_ATTR_TUN_ID, /* be64 tunnel ID */
>> + OVS_KEY_ATTR_IPV4_TUNNEL, /* struct ovs_key_ipv4_tunnel */
>
> Where possible we should try to avoid breaking userspace/kernel
> compatibility. We won't be able to completely prevent it since we
> want to completely switch over to the new mechanism but we should
> limit the number of times that we have to do it. In this case, let's
> keep OVS_KEY_ATTR_TUN_ID as 63. For OVS_KEY_ATTR_IPV4_TUNNEL, let's
> take the next high value (i.e. 62). Pretty soon we'll want to use a
> "real" value but until we have some userspace code that actually
> depends on it this will give us a chance to make any modifications we
> need.
>
Updated.
>> diff --git a/lib/odp-util.c b/lib/odp-util.c
>> index 901dac3..e2c1f2a 100644
>> --- a/lib/odp-util.c
>> +++ b/lib/odp-util.c
>> @@ -106,6 +106,7 @@ ovs_key_attr_to_string(enum ovs_key_attr attr)
>> case OVS_KEY_ATTR_ARP: return "arp";
>> case OVS_KEY_ATTR_ND: return "nd";
>> case OVS_KEY_ATTR_TUN_ID: return "tun_id";
>> + case OVS_KEY_ATTR_IPV4_TUNNEL: return "ipv4_tunnel";
>>
>> case __OVS_KEY_ATTR_MAX:
>> default:
>> @@ -614,6 +615,7 @@ odp_flow_key_attr_len(uint16_t type)
>> case OVS_KEY_ATTR_ICMPV6: return sizeof(struct ovs_key_icmpv6);
>> case OVS_KEY_ATTR_ARP: return sizeof(struct ovs_key_arp);
>> case OVS_KEY_ATTR_ND: return sizeof(struct ovs_key_nd);
>> + case OVS_KEY_ATTR_IPV4_TUNNEL: return sizeof(struct
>> ovs_key_ipv4_tunnel);
>
> Since these statements aren't sensitive to ordering, I'm inclined to
> keep them in a more natural order based on layering.
>
Fixed the ordering here.
>> diff --git a/lib/odp-util.h b/lib/odp-util.h
>> index 16f2b15..df89c72 100644
>> --- a/lib/odp-util.h
>> +++ b/lib/odp-util.h
>> @@ -80,6 +80,7 @@ int odp_actions_from_string(const char *, const struct
>> simap *port_names,
>> * ------ --- ------ -----
>> * OVS_KEY_ATTR_PRIORITY 4 -- 4 8
>> * OVS_KEY_ATTR_TUN_ID 8 -- 4 12
>> + * OVS_KEY_ATTR_IPV4_TUNNEL 18 2 4 24
>
> Same issue with the calculation here as well.
Fixed.
_______________________________________________
dev mailing list
[email protected]
http://openvswitch.org/mailman/listinfo/dev