With small comments below: Acked-by: Jarno Rajahalme <ja...@ovn.org>
> On Feb 19, 2016, at 12:34 AM, Ben Pfaff <b...@ovn.org> wrote: > > One purpose of OpenFlow packet-in messages is to allow a controller to > interpose on the path of a packet through the flow tables. If, for > example, the controller needs to modify a packet in some way that the > switch doesn't directly support, the controller should be able to > program the switch to send it the packet, then modify the packet and > send it back to the switch to continue through the flow table. > > That's the theory. In practice, this doesn't work with any but the > simplest flow tables. Packet-in messages simply don't include enough > context to allow the flow table traversal to continue. For example: > > * Via "resubmit" actions, an Open vSwitch packet can have an > effective "call stack", but a packet-in can't describe it, and > so it would be lost. > > * Via "patch ports", an Open vSwitch packet can traverse multiple > OpenFlow logical switches. A packet-in can't describe or resume > this context. > Is there any context regarding this that needs to be described? > * A packet-in can't preserve the stack used by NXAST_PUSH and > NXAST_POP actions. > > * A packet-in can't preserve the OpenFlow 1.1+ action set. > > * A packet-in can't preserve the state of Open vSwitch mirroring > or connection tracking. > > This commit introduces a solution called "continuations". A continuation > is the state of a packet's traversal through OpenFlow flow tables. A > "controller" action with the "pause" flag, which is newly implemented in > this comit, generates a continuation and sends it to the OpenFlow “commit" > controller in a packet-in asynchronous message (only NXT_PACKET_IN2 > supports continuations, so the controller must configure them with > NXT_SET_PACKET_IN_FORMAT). The controller processes the packet-in, > possibly modifying some of its data, and sends it back to the switch with > an NXT_RESUME request, which causes flow table traversal to continue. In > principle, a single packet can be paused and resumed multiple times. > > Another way to look at it is: > > - "pause" is an extension of the existing OFPAT_CONTROLLER > action. It sends the packet to the controller, with full > pipeline context (some of which is switch implementation > dependent, and may thus vary from switch to switch). > > - A continuation is an extension of OFPT_PACKET_IN, allowing for > implementation dependent metadata. > > - NXT_RESUME is an extension of OFPT_PACKET_OUT, with the > semantics that the pipeline processing is continued with the > original translation context from where it was left at the time > it was paused. > > Signed-off-by: Ben Pfaff <b...@ovn.org> > Acked-by: Jarno Rajahalme <ja...@ovn.org> > --- > NEWS | 5 +- > include/openflow/nicira-ext.h | 96 ++++++++- > lib/learning-switch.c | 3 +- > lib/meta-flow.c | 9 +- > lib/meta-flow.h | 3 +- > lib/ofp-actions.c | 28 ++- > lib/ofp-actions.h | 5 + > lib/ofp-errors.h | 16 +- > lib/ofp-msgs.h | 4 + > lib/ofp-print.c | 78 +++++-- > lib/ofp-util.c | 470 ++++++++++++++++++++++++++++++++++++------ > lib/ofp-util.h | 57 ++++- > lib/rconn.c | 3 +- > ofproto/connmgr.c | 23 ++- > ofproto/connmgr.h | 2 +- > ofproto/fail-open.c | 16 +- > ofproto/ofproto-dpif-xlate.c | 199 ++++++++++++++---- > ofproto/ofproto-dpif-xlate.h | 4 + > ofproto/ofproto-dpif.c | 34 +++ > ofproto/ofproto-provider.h | 3 + > ofproto/ofproto.c | 24 +++ > ovn/TODO | 57 ----- > ovn/controller/pinctrl.c | 4 +- > tests/ofp-actions.at | 13 +- > tests/ofp-print.at | 12 ++ > tests/ofproto-dpif.at | 172 ++++++++++++++++ > tests/ofproto-macros.at | 35 +++- > utilities/ovs-ofctl.8.in | 11 +- > utilities/ovs-ofctl.c | 109 +++++++--- > 29 files changed, 1239 insertions(+), 256 deletions(-) > > diff --git a/NEWS b/NEWS > index 9ab6cae..ba4b7f7 100644 > --- a/NEWS > +++ b/NEWS > @@ -6,7 +6,10 @@ Post-v2.5.0 > * OpenFlow 1.1+ OFPT_QUEUE_GET_CONFIG_REQUEST now supports OFPP_ANY. > * OpenFlow 1.4+ OFPMP_QUEUE_DESC is now supported. > * New property-based packet-in message format NXT_PACKET_IN2 with support > - for arbitrary user-provided data. > + for arbitrary user-provided data and for serializing flow table > + traversal into a continuation for later resumption. > + * New extension message NXT_SET_ASYNC_CONFIG2 to allow OpenFlow 1.4-like > + control over asynchronous messages in earlier versions of OpenFlow. > - ovs-ofctl: > * queue-get-config command now allows a queue ID to be specified. > - DPDK: > diff --git a/include/openflow/nicira-ext.h b/include/openflow/nicira-ext.h > index 7e56066..77a735d 100644 > --- a/include/openflow/nicira-ext.h > +++ b/include/openflow/nicira-ext.h > @@ -260,12 +260,103 @@ struct nx_packet_in { > }; > OFP_ASSERT(sizeof(struct nx_packet_in) == 24); > > -/* NXT_PACKET_IN2. > +/* NXT_PACKET_IN2 > + * ============== > * > * NXT_PACKET_IN2 is conceptually similar to OFPT_PACKET_IN but it is > expressed > * as an extensible set of properties instead of using a fixed structure. > * > - * Added in Open vSwitch 2.6. */ > + * Added in Open vSwitch 2.6 > + * > + * > + * Continuations > + * ------------- > + * > + * When a "controller" action specifies the "pause" flag, the controller > action > + * freezes the packet's trip through Open vSwitch flow tables and serializes > + * that state into the packet-in message as a "continuation". The controller > + * can later send the continuation back to the switch, which will restart the > + * packet's traversal from the point where it was interrupted. This permits > an > + * OpenFlow controller to interpose on a packet midway through processing in > + * Open vSwitch. > + * > + * Continuations fit into packet processing this way: > + * > + * 1. A packet ingresses into Open vSwitch, which runs it through the > OpenFlow > + * tables. > + * > + * 2. An OpenFlow flow executes a "controller" action that includes the > "pause" > + * flag. Open vSwitch serializes the packet processing state and sends > it, > + * as an NXT_PACKET_IN2 that includes an additional NXPINT_CONTINUATION > + * property (the continuation), to the OpenFlow controller. > + * > + * (The controller must use NXAST_CONTROLLER2 to generate the packet-in, > + * because only this form of the "controller" action has a "pause" flag. > + * Similarly, the controller must use NXT_SET_PACKET_IN_FORMAT to select > + * NXT_PACKET_IN2 as the packet-in format, because this is the only format > + * that supports continuation passing.) > + * > + * 3. The controller receives the NXT_PACKET_IN2 and processes it. The > + * controller can interpret and, if desired, modify some of the contents > of > + * the packet-in, such as the packet and the metadata being processed. > + * > + * 4. The controller sends the continuation back to the switch, using an > + * NXT_RESUME message. Packet processing resumes where it left off. > + * > + * The controller might change the pipeline configuration concurrently with > + * steps 2 through 4. For example, it might add or remove OpenFlow flows. > If > + * that happens, then the packet will experience a mix of processing from the > + * two configurations, that is, the initial processing (before > + * NXAST_CONTROLLER2) uses the initial flow table, and the later processing > + * (after NXT_RESUME) uses the later flow table. Maybe it should be noted here that if the layout of data that is pushed/popped to/from the stack changes then the continuation of the packet processing might have unpredictable behavior. But maybe this is true for pipeline “shape” changes in general. > + * > + * External side effects (e.g. "output") of OpenFlow actions processed before > + * NXAST_CONTROLLER2 is encountered might be executed during step 2 or step > 4, > + * and the details may vary among Open vSwitch features and versions. Thus, > a > + * controller that wants to make sure that side effects are executed must > pass > + * the continuation back to the switch, that is, must not skip step 4. > + * > + * Architecturally, continuations may be "stateful" or "stateless", that is, > + * they may or may not refer to buffered state maintained in Open vSwitch. > + * This means that a controller should not attempt to resume a given > + * continuations more than once (because the switch might have discarded the > + * buffered state after the first use). For the same reason, continuations > + * might become "stale" if the controller takes too long to resume them > + * (because the switch might have discarded old buffered state). Taken > + * together with the previous note, this means that a controller should > resume > + * each continuation exactly once (and promptly). > + * > + * Without the information in NXPINT_CONTINUATION, the controller can (with > + * careful design, and help from the flow cookie) determine where the packet > is > + * in the pipeline, but in the general case it can't determine what nested > + * "resubmit"s that may be in progress, or what data is on the stack > maintained > + * by NXAST_STACK_PUSH and NXAST_STACK_POP actions, what is in the OpenFlow > + * action set, etc. > + * > + * Continuations are expensive because they require a round trip between the > + * switch and the controller. Thus, they should not be used to implement > + * processing that needs to happen at "line rate". > + * > + * The contents of NXPINT_CONTINUATION are private to the switch, may change > + * unpredictably from one version of Open vSwitch to another, and are not > + * documented here. The contents are also tied to a given Open vSwitch > process > + * and bridge, so that restarting Open vSwitch or deleting and recreating a > + * bridge will cause the corresponding NXT_RESUME to be rejected. > + * > + * In the current implementation, Open vSwitch forks the packet processing > + * pipeline across patch ports. Suppose, for example, that the pipeline for > + * br0 outputs to a patch port whose peer belongs to br1, and that the > pipeline > + * for br1 executes a controller action with the "pause" flag. This only > + * pauses processing within br1, and processing in br0 continues and possibly > + * completes with visible side effects, such as outputting to ports, before > + * br1's controller receives or processes the continuation. This > + * implementation maintains the independence of separate bridges and, since > + * processing in br1 cannot affect the behavior of br0 anyway, should not > cause > + * visible behavioral changes. > + * > + * A packet-in that includes a continuation always includes the entire packet > + * and is never buffered. Does this need to be the case? Does not not contradict the stateful/stateless comment above? > + */ > enum nx_packet_in2_prop_type { > /* Packet. */ > NXPINT_PACKET, /* Raw packet data. */ > @@ -280,6 +371,7 @@ enum nx_packet_in2_prop_type { > NXPINT_REASON, /* uint8_t, one of OFPR_*. */ > NXPINT_METADATA, /* NXM or OXM for metadata fields. */ > NXPINT_USERDATA, /* From NXAST_CONTROLLER2 userdata. */ > + NXPINT_CONTINUATION, /* Private data for continuing processing. */ > }; > > /* Configures the "role" of the sending controller. The default role is: > (snip) _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev