On 7 March 2017 at 16:15, Andy Zhou <az...@ovn.org> wrote: > With the introduction of open flow 'clone' action, the OVS user space > can now translate the 'clone' action into kernel datapath 'sample' > action, with 100% probability, to ensure that the clone semantics, > which is that the packet seen by the clone action is the same as the > packet seen by the action after clone, is faithfully carried out > in the datapath. > > While the sample action in the datpath has the matching semantics, > its implementation is only optimized for its original use. > Specifically, there are two limitation: First, there is a 3 level of > nesting restriction, enforced at the flow downloading time. This > limit turns out to be too restrictive for the 'clone' use case. > Second, the implementation avoid recursive call only if the sample > action list has a single userspace action. > > The optimization implemented in this series removes the static > nesting limit check, instead, implement the run time recursion limit > check, and recursion avoidance similar to that of the 'recirc' action. > This optimization solve both #1 and #2 issues above. > > Another optimization implemented is to avoid coping flow key as
*copying > long as the actions enclosed does not change the flow key. The > detection is performed only once at the flow downloading time. > > The third optimization implemented is to rewrite the action list > at flow downloading time in order to save the fast path from parsing > the sample action list in its original form repeatedly. Whenever there is an enumeration (1, 2, 3; ..another..., third thing implemented) in a commit message, I have to ask whether each "another change..." should be a separate patch. It certainly makes it easier to review. I ran this through the OVS kernel tests and it's working correctly from that point of view, but I didn't get a chance to dig in and ensure for example, runtime behaviour of several nested sample(actions(sample(actions(sample(actions(output)))))) handles reasonably when it runs out of stack and deferred actions space. At a high level though, this seems pretty straightforward. Several comments below, thanks. > > Signed-off-by: Andy Zhou <az...@ovn.org> > --- > net/openvswitch/actions.c | 106 ++++++++++++++++++------------------ > net/openvswitch/datapath.h | 7 +++ > net/openvswitch/flow_netlink.c | 118 > ++++++++++++++++++++++++++++------------- > 3 files changed, 140 insertions(+), 91 deletions(-) > > diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c > index 259aea9..2e8c372 100644 > --- a/net/openvswitch/actions.c > +++ b/net/openvswitch/actions.c > @@ -930,71 +930,52 @@ static int output_userspace(struct datapath *dp, struct > sk_buff *skb, > } > > static int sample(struct datapath *dp, struct sk_buff *skb, > - struct sw_flow_key *key, const struct nlattr *attr, > - const struct nlattr *actions, int actions_len) > + struct sw_flow_key *key, const struct nlattr *attr) > { > - const struct nlattr *acts_list = NULL; > - const struct nlattr *a; > - int rem; > - u32 cutlen = 0; > - > - for (a = nla_data(attr), rem = nla_len(attr); rem > 0; > - a = nla_next(a, &rem)) { > - u32 probability; > - > - switch (nla_type(a)) { > - case OVS_SAMPLE_ATTR_PROBABILITY: > - probability = nla_get_u32(a); > - if (!probability || prandom_u32() > probability) > - return 0; > - break; > - > - case OVS_SAMPLE_ATTR_ACTIONS: > - acts_list = a; > - break; > - } > - } > + struct nlattr *actions; > + struct nlattr *sample_arg; > + struct sw_flow_key *orig = key; > + int rem = nla_len(attr); > + int err = 0; > + const struct sample_arg *arg; > > - rem = nla_len(acts_list); > - a = nla_data(acts_list); > + /* The first action is always 'OVS_SAMPLE_ATTR_AUX'. */ Is it? This is the only reference to OVS_SAMPLE_ATTR_AUX I can see. > + sample_arg = nla_data(attr); We could do this in the parent call, like several other actions do. <snip> > @@ -1246,9 +1227,24 @@ static int do_execute_actions(struct datapath *dp, > struct sk_buff *skb, > err = execute_masked_set_action(skb, key, > nla_data(a)); > break; > > - case OVS_ACTION_ATTR_SAMPLE: > - err = sample(dp, skb, key, a, attr, len); > + case OVS_ACTION_ATTR_SAMPLE: { > + bool last = nla_is_last(a, rem); > + struct sk_buff *clone_skb; > + > + clone_skb = last ? skb : skb_clone(skb, GFP_ATOMIC); > + > + if (!clone_skb) > + /* Out of memory, skip this sample action. > + */ Should we ratelimited complain in this case? > + break; > + > + err = sample(dp, clone_skb, key, a); > + if (last) > + return err; Given that sample() doesn't guarantee it will free 'clone_skb', I don't think this is safe. > + > + pr_err("executing sample"); Useful for debugging, but don't forget to drop for next version. > break; > + } This bracket is typically aligned with the character of the beginning of the 'case' that it applies to. > > case OVS_ACTION_ATTR_CT: > if (!is_flow_key_valid(key)) { > diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h > index 1c6e937..d7a6e803 100644 > --- a/net/openvswitch/datapath.h > +++ b/net/openvswitch/datapath.h > @@ -220,4 +220,11 @@ int ovs_execute_actions(struct datapath *dp, struct > sk_buff *skb, > if (logging_allowed && net_ratelimit()) \ > pr_info("netlink: " fmt "\n", ##__VA_ARGS__); \ > } while (0) > + > +#define OVS_SAMPLE_ATTR_ARG (OVS_ACTION_ATTR_MAX + 1) This should be defined in include/uapi/linux/openvswitch.h, protected by #ifdef __KERNEL__ like the other internal formats. While I don't think it actually conflicts with any other internal action format, these should be in one place so we don't end up accidentally using the same enum twice. > +struct sample_arg { > + bool exec; > + u32 probability; > +}; Perhaps we should move this to either net/openvswitch/flow_netlink.h or include/linux/openvswitch.h since it's only for consumption internally by the kernel module? > + > #endif /* datapath.h */ > diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c > index 6f5fa50..0e7cf13 100644 > --- a/net/openvswitch/flow_netlink.c > +++ b/net/openvswitch/flow_netlink.c > @@ -1,5 +1,5 @@ > /* > - * Copyright (c) 2007-2014 Nicira, Inc. > + * Copyright (c) 2007-2017 Nicira, Inc. > * > * This program is free software; you can redistribute it and/or > * modify it under the terms of version 2 of the GNU General Public > @@ -59,6 +59,39 @@ struct ovs_len_tbl { > #define OVS_ATTR_NESTED -1 > #define OVS_ATTR_VARIABLE -2 > > +static bool actions_may_change_flow(const struct nlattr *actions) > +{ > + struct nlattr *nla; > + int rem; > + > + nla_for_each_nested(nla, actions, rem) { > + u16 action = nla_type(nla); > + > + switch (action) { > + case OVS_ACTION_ATTR_OUTPUT: > + case OVS_ACTION_ATTR_RECIRC: > + case OVS_ACTION_ATTR_USERSPACE: > + case OVS_ACTION_ATTR_SAMPLE: > + case OVS_ACTION_ATTR_TRUNC: Trunc doesn't change the flow, but if there's something like sample(output,trunc),output() then I wouldn't expect the final output to be truncated. <snip> > @@ -2621,39 +2670,36 @@ int ovs_nla_copy_actions(struct net *net, const > struct nlattr *attr, > return err; > } > > -static int sample_action_to_attr(const struct nlattr *attr, struct sk_buff > *skb) > +static int sample_action_to_attr(const struct nlattr *attr, > + struct sk_buff *skb) > { > - const struct nlattr *a; > - struct nlattr *start; > - int err = 0, rem; > + struct nlattr *start, *ac_start, *sample_arg; > + int err = 0, rem = nla_len(attr); > + const struct sample_arg *arg; > + struct nlattr *actions; > > start = nla_nest_start(skb, OVS_ACTION_ATTR_SAMPLE); > if (!start) > return -EMSGSIZE; > > - nla_for_each_nested(a, attr, rem) { > - int type = nla_type(a); > - struct nlattr *st_sample; > + sample_arg = nla_data(attr); > + arg = nla_data(sample_arg); > + actions = nla_next(sample_arg, &rem); > > - switch (type) { > - case OVS_SAMPLE_ATTR_PROBABILITY: > - if (nla_put(skb, OVS_SAMPLE_ATTR_PROBABILITY, > - sizeof(u32), nla_data(a))) > - return -EMSGSIZE; > - break; > - case OVS_SAMPLE_ATTR_ACTIONS: > - st_sample = nla_nest_start(skb, > OVS_SAMPLE_ATTR_ACTIONS); > - if (!st_sample) > - return -EMSGSIZE; > - err = ovs_nla_put_actions(nla_data(a), nla_len(a), > skb); > - if (err) > - return err; > - nla_nest_end(skb, st_sample); > - break; > - } > - } > + if (nla_put_u32(skb, OVS_SAMPLE_ATTR_PROBABILITY, arg->probability)) > + return -EMSGSIZE; > > + ac_start = nla_nest_start(skb, OVS_SAMPLE_ATTR_ACTIONS); > + if (!ac_start) > + return -EMSGSIZE; > + > + err = ovs_nla_put_actions(actions, rem, skb); > + if (err) > + return err; Shouldn't we nla_nest_cancel() when something fails in the middle of nesting nla attributes? For example I believe that upcall will attempt to serialize actions, but if actions don't fit in the buffer then it'll just skip the actions and forward everything else. > + > + nla_nest_end(skb, ac_start); > nla_nest_end(skb, start); > + > return err; > } > > -- > 1.8.3.1 >