For the most part it looks good. I do have a few comments inline, a couple of 
them towards the bottom being significant.

-----"dev" <dev-boun...@openvswitch.org> wrote: -----

>To: dev@openvswitch.org
>From: Gurucharan Shetty 
>Sent by: "dev" 
>Date: 05/19/2016 10:58PM
>Subject: [ovs-dev] [PATCH v3 5/5] ovn: DNAT and SNAT on a gateway router.
>
>For traffic from physical space to virtual space we need DNAT.
>The DNAT happens in the gateway router and reaches the logical
>port. The return traffic should be unDNATed.
>
>Traffic originating in virtual space heading to physical space
>should be SNATed. The return traffic is unSNATted.
>
>East-west traffic with the public destination IP address needs
>a DNAT. This traffic is punted to the l3 gateway where DNAT
>takes place. This traffic is also SNATed and eventually loops back to
>its destination. The SNAT is needed because we need the reverse traffic
>to go back to the l3 gateway and not short-circuit directly to the source.
>
>This commit introduces 4 new logical actions.
>1. ct_snat: To send the packet through SNAT zone to unSNAT packets.
>2. ct_snat(IP): To SNAT to the provided IP address.
>3. ct_dnat: To send the packet throgh DNAT zone to unDNAT packets.
>4. ct_dnat(IP): To DNAT to the provided IP.
>
>This commit only provides the ability to do IP based NAT. This will
>eventually be enhanced to do PORT based NAT too.
>
>Command hints:
>
>Consider a distributed router "R1" that has switch foo (192.168.1.0/24)
>with a lport foo1 (192.168.1.2) and bar (192.168.2.0/24) with lport bar1
>(192.168.2.2) connected to it. You connect "R1" to
>a gateway router "R2" via a switch "join" in (20.0.0.0/24) network.
>
>R2 has a switch "alice" (172.16.1.0/24) connected to it (to simulate
>external network).
>
>case: Add pure DNAT (north-south)
>
>Add a DNAT rule in R2:
>ovn-nbctl set logical_router R2 dnat:"30.0.0.2"=192.168.1.2
>
>Now alice1 should be able to ping 192.168.1.2 via 30.0.0.2.
>
>case2 : Add pure SNAT (south-north)
>
>Add a SNAT rule in R2:
>
>ovn-nbctl set logical_router R2 snat:"192.168.1.0/24"=30.0.0.1
>
>(You need a static route in R1 to send packets destined to outside
>world to go through R2)
>
>When foo1 pings alice1, alice1 receives traffic from 30.0.0.1
>
>case3 : SNAT and DNAT (east-west traffic)
>
>Add a SNAT rule for bar1
>ovn-nbctl set logical_router R2 snat:"192.168.2.2"=30.0.0.3
>
>When bar1 pings 30.0.0.2, the traffic jumps to the gateway router
>and loops back to foo1 with a source ip address of 30.0.0.3
>
>Signed-off-by: Gurucharan Shetty <g...@ovn.org>
>---
> ovn/lib/actions.c           |  83 +++++++++++++++++
> ovn/northd/ovn-northd.8.xml | 129 +++++++++++++++++++++++---
> ovn/northd/ovn-northd.c     | 214
>++++++++++++++++++++++++++++++++++++++++++--
> ovn/ovn-nb.ovsschema        |  12 ++-
> ovn/ovn-nb.xml              |  35 ++++++--
> ovn/ovn-sb.xml              |  38 ++++++++
> 6 files changed, 487 insertions(+), 24 deletions(-)
>
>diff --git a/ovn/lib/actions.c b/ovn/lib/actions.c
>index 5f0bf19..4a486a0 100644
>--- a/ovn/lib/actions.c
>+++ b/ovn/lib/actions.c
>@@ -442,6 +442,85 @@ emit_ct(struct action_context *ctx, bool recirc_next, 
>bool commit)
>     add_prerequisite(ctx, "ip");
> }
> 
>+static void
>+parse_ct_nat(struct action_context *ctx, bool snat)
>+{
>+    const size_t ct_offset = ctx->ofpacts->size;
>+    ofpbuf_pull(ctx->ofpacts, ct_offset);
>+
>+    struct ofpact_conntrack *ct = ofpact_put_CT(ctx->ofpacts);
>+
>+    if (ctx->ap->cur_ltable < ctx->ap->n_tables) {
>+        ct->recirc_table = ctx->ap->first_ptable + ctx->ap->cur_ltable + 1;
>+    } else {
>+        action_error(ctx,
>+                     "\"ct_[sd]nat\" action not allowed in last table.");
>+        return;
>+    }
>+
>+    if (snat) {
>+        ct->zone_src.field = mf_from_id(MFF_LOG_SNAT_ZONE);
>+    } else {
>+        ct->zone_src.field = mf_from_id(MFF_LOG_DNAT_ZONE);
>+    }
>+    ct->zone_src.ofs = 0;
>+    ct->zone_src.n_bits = 16;
>+    ct->flags = 0;
>+    ct->alg = 0;
>+
>+    add_prerequisite(ctx, "ip");
>+
>+    struct ofpact_nat *nat;
>+    size_t nat_offset;
>+    nat_offset = ctx->ofpacts->size;
>+    ofpbuf_pull(ctx->ofpacts, nat_offset);
>+
>+    nat = ofpact_put_NAT(ctx->ofpacts);
>+    nat->flags = 0;
>+    nat->range_af = AF_UNSPEC;
>+
>+    int commit = 0;
>+    if (lexer_match(ctx->lexer, LEX_T_LPAREN)) {
>+        ovs_be32 ip;
>+        if (ctx->lexer->token.type == LEX_T_INTEGER
>+            && ctx->lexer->token.format == LEX_F_IPV4) {
>+            ip = ctx->lexer->token.value.ipv4;
>+        } else {
>+            action_syntax_error(ctx, "invalid ip");
>+            return;
>+        }
>+
>+        nat->range_af = AF_INET;
>+        nat->range.addr.ipv4.min = ip;
>+        if (snat) {
>+            nat->flags |= NX_NAT_F_SRC;
>+        } else {
>+            nat->flags |= NX_NAT_F_DST;
>+        }
>+        commit = NX_CT_F_COMMIT;
>+        lexer_get(ctx->lexer);
>+        if (!lexer_match(ctx->lexer, LEX_T_RPAREN)) {
>+            action_syntax_error(ctx, "expecting `)'");
>+            return;
>+        }
>+    }
>+
>+    ctx->ofpacts->header = ofpbuf_push_uninit(ctx->ofpacts, nat_offset);
>+    ct = ctx->ofpacts->header;
>+    ct->flags |= commit;
>+
>+    /* XXX: For performance reasons, we try to prevent additional
>+     * recirculations.  So far, ct_snat which is used in a gateway router
>+     * does not need a recirculation. ct_snat(IP) does need a recirculation.
>+     * Should we consider a method to let the actions specify whether a action
>+     * needs recirculation if there more use cases?. */
>+    if (!commit && snat) {
>+        ct->recirc_table = NX_CT_RECIRC_NONE;
>+    }
>+    ofpact_finish(ctx->ofpacts, &ct->ofpact);
>+    ofpbuf_push_uninit(ctx->ofpacts, ct_offset);
>+}
>+
> static bool
> parse_action(struct action_context *ctx)
> {
>@@ -469,6 +548,10 @@ parse_action(struct action_context *ctx)
>         emit_ct(ctx, true, false);
>     } else if (lexer_match_id(ctx->lexer, "ct_commit")) {
>         emit_ct(ctx, false, true);
>+    } else if (lexer_match_id(ctx->lexer, "ct_dnat")) {
>+        parse_ct_nat(ctx, false);
>+    } else if (lexer_match_id(ctx->lexer, "ct_snat")) {
>+        parse_ct_nat(ctx, true);
>     } else if (lexer_match_id(ctx->lexer, "arp")) {
>         parse_arp_action(ctx);
>     } else if (lexer_match_id(ctx->lexer, "get_arp")) {
>diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
>index 970c352..838fbb7 100644
>--- a/ovn/northd/ovn-northd.8.xml
>+++ b/ovn/northd/ovn-northd.8.xml
>@@ -500,11 +500,40 @@ next;
> 
>       <li>
>         <p>
>-          Reply to ARP requests.  These flows reply to ARP requests for the
>-          router's own IP address.  For each router port <var>P</var> that 
>owns
>-          IP address <var>A</var> and Ethernet address <var>E</var>, a
>-          priority-90 flow matches <code>inport == <var>P</var> &amp;&amp;
>-          arp.op == 1 &amp;&amp; arp.tpa == <var>A</var></code> (ARP request)
>+          Reply to ARP requests.
>+        </p>
>+
>+        <p>
>+          These flows reply to ARP requests for the router's own IP address.
>+          For each router port <var>P</var> that owns IP address <var>A</var>
>+          and Ethernet address <var>E</var>, a priority-90 flow matches
>+          <code>inport == <var>P</var> &amp;&amp; arp.op == 1 &amp;&amp;
>+          arp.tpa == <var>A</var></code> (ARP request) with the following
>+          actions:
>+        </p>
>+
>+        <pre>
>+eth.dst = eth.src;
>+eth.src = <var>E</var>;
>+arp.op = 2; /* ARP reply. */
>+arp.tha = arp.sha;
>+arp.sha = <var>E</var>;
>+arp.tpa = arp.spa;
>+arp.spa = <var>A</var>;
>+outport = <var>P</var>;
>+inport = ""; /* Allow sending out inport. */
>+output;
>+        </pre>
>+      </li>
>+
>+      <li>
>+        <p>
>+          These flows reply to ARP requests for the virtual IP addresses
>+          configured in the router for DNAT. For a configured DNAT IP address
>+          <var>A</var>, for each router port <var>P</var> with Ethernet
>+          address <var>E</var>, a priority-90 flow matches
>+          <code>inport == <var>P</var> &amp;&amp; arp.op == 1 &amp;&amp;
>+          arp.tpa == <var>A</var></code> (ARP request)
>           with the following actions:
>         </p>
> 
>@@ -646,7 +675,63 @@ icmp4 {
>       </li>
>     </ul>
> 
>-    <h3>Ingress Table 2: IP Routing</h3>
>+    <h3>Ingress Table 2: SNAT</h3>
>+
>+    <p>
>+      This is for already established connections' reverse traffic.
>+      i.e., SNAT has already been done in egress pipeline and now the
>+      packet has entered the ingress pipeline as part of a reply. It is
>+      unSNATted here.
>+    </p>
>+
>+    <ul>
>+      <li>
>+        <p>
>+          For each configuration in the OVN Northdbound database, that asks
>+          to change the source IP address of a packet from <var>A</var> to
>+          <var>B</var>, a priority-100 flow matches <code>ip4.dst ==
>+          <var>B</var></code> with an action <code>ct_snat; next;</code>.
>+        </p>
>+
>+        <p>
>+          A priority-0 logical flow with match <code>1</code> has actions
>+          <code>next;</code>.
>+        </p>
>+      </li>
>+    </ul>
>+
>+    <h3>Ingress Table 3: DNAT</h3>
>+
>+    <p>
>+      Packets enter the pipeline with destination IP address that needs to
>+      be DNATted from a virtual IP address to a real IP address. Packets
>+      in the reverse direction needs to be unDNATed.
>+    </p>
>+    <ul>
>+      <li>
>+        <p>
>+          For each configuration in the OVN Northdbound database, that asks
>+          to change the destination IP address of a packet from <var>A</var> 
>to
>+          <var>B</var>, a priority-100 flow matches <code>ip4.dst ==
>+          <var>A</var></code> with an action <code>ct_dnat(<var>B</var>);
>+          </code>.
>+        </p>
>+
>+        <p>
>+          For each configuration in the OVN Northdbound database, that asks
>+          to change the destination IP address of a packet from <var>A</var> 
>to
>+          <var>B</var>, a priority-50 flow matches every <code>ip</code> 
>packet
>+          with an action <code>ct_dnat;</code>.

Looking at the code, it looks like any IP packet matches the priority-50 flow. 
Something like:
For all IP packets, a priority-50 flow with an action <code>ct_dnat;</code>.

>+        </p>
>+
>+        <p>
>+          A priority-0 logical flow with match <code>1</code> has actions
>+          <code>next;</code>.
>+        </p>
>+      </li>
>+    </ul>
>+
>+    <h3>Ingress Table 4: IP Routing</h3>
> 
>     <p>
>       A packet that arrives at this table is an IP packet that should be 
> routed
>@@ -655,7 +740,7 @@ icmp4 {
>       <code>ip4.dst</code>, the packet's final destination, unchanged) and
>       advances to the next table for ARP resolution.  It also sets
>       <code>reg1</code> to the IP address owned by the selected router port
>-      (which is used later in table 4 as the IP source address for an ARP
>+      (which is used later in table 6 as the IP source address for an ARP
>       request, if needed).
>     </p>
> 
>@@ -726,7 +811,7 @@ icmp4 {
>       </li>
>     </ul>
> 
>-    <h3>Ingress Table 3: ARP Resolution</h3>
>+    <h3>Ingress Table 5: ARP Resolution</h3>
> 
>     <p>
>       Any packet that reaches this table is an IP packet whose next-hop IP
>@@ -781,7 +866,7 @@ icmp4 {
>       </li>
>     </ul>
> 
>-    <h3>Ingress Table 4: ARP Request</h3>
>+    <h3>Ingress Table 6: ARP Request</h3>
> 
>     <p>
>       In the common case where the Ethernet destination has been resolved, 
> this
>@@ -806,7 +891,7 @@ arp {
>         </pre>
> 
>         <p>
>-          (Ingress table 2 initialized <code>reg1</code> with the IP address
>+          (Ingress table 4 initialized <code>reg1</code> with the IP address
>           owned by <code>outport</code>.)
>         </p>
> 
>@@ -821,7 +906,29 @@ arp {
>       </li>
>     </ul>
> 
>-    <h3>Egress Table 0: Delivery</h3>
>+    <h3>Egress Table 0: SNAT</h3>
>+
>+    <p>
>+      Packets that are configured to be SNATed get their source IP address
>+      changed based on the configuration in the OVN Northdbound database.
>+    </p>
>+    <ul>
>+      <li>
>+        <p>
>+          For each configuration in the OVN Northdbound database, that asks
>+          to change the source IP address of a packet from <var>A</var> to
>+          <var>B</var>, a priority-100 flow matches <code>ip4.src ==
>+          <var>A</var></code> with an action <code>ct_snat(<var>B</var>)
>+          </code>.
>+        </p>
>+        <p>
>+          A priority-0 logical flow with match <code>1</code> has actions
>+          <code>next;</code>.
>+        </p>
>+      </li>
>+    </ul>
>+
>+    <h3>Egress Table 1: Delivery</h3>
> 
>     <p>
>       Packets that reach this table are ready for delivery.  It contains
>diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
>index 7852d83..3ce88a7 100644
>--- a/ovn/northd/ovn-northd.c
>+++ b/ovn/northd/ovn-northd.c
>@@ -105,12 +105,15 @@ enum ovn_stage {
>     /* Logical router ingress stages. */ \
>     PIPELINE_STAGE(ROUTER, IN,  ADMISSION,   0, "lr_in_admission") \
>     PIPELINE_STAGE(ROUTER, IN,  IP_INPUT,    1, "lr_in_ip_input") \
>-    PIPELINE_STAGE(ROUTER, IN,  IP_ROUTING,  2, "lr_in_ip_routing") \
>-    PIPELINE_STAGE(ROUTER, IN,  ARP_RESOLVE, 3, "lr_in_arp_resolve") \
>-    PIPELINE_STAGE(ROUTER, IN,  ARP_REQUEST, 4, "lr_in_arp_request") \
>+    PIPELINE_STAGE(ROUTER, IN,  SNAT,        2, "lr_in_snat") \

I find calling this table SNAT misleading, since it is actually reverting the 
destination address.
How about UNDO_SNAT or UNSNAT?

>+    PIPELINE_STAGE(ROUTER, IN,  DNAT,        3, "lr_in_dnat") \
>+    PIPELINE_STAGE(ROUTER, IN,  IP_ROUTING,  4, "lr_in_ip_routing") \
>+    PIPELINE_STAGE(ROUTER, IN,  ARP_RESOLVE, 5, "lr_in_arp_resolve") \
>+    PIPELINE_STAGE(ROUTER, IN,  ARP_REQUEST, 6, "lr_in_arp_request") \
> \
>     /* Logical router egress stages. */ \
>-    PIPELINE_STAGE(ROUTER, OUT, DELIVERY,    0, "lr_out_delivery")
>+    PIPELINE_STAGE(ROUTER, OUT, SNAT,      0, "lr_out_snat") \
>+    PIPELINE_STAGE(ROUTER, OUT, DELIVERY,  1, "lr_out_delivery")
> 
> #define PIPELINE_STAGE(DP_TYPE, PIPELINE, STAGE, TABLE, NAME)   \
>     S_##DP_TYPE##_##PIPELINE##_##STAGE                          \
>@@ -1965,6 +1968,44 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap 
>*ports,
>         free(match);
>         free(actions);
> 
>+        /* ARP handling for virtual IP addresses.
>+         *
>+         * DNAT IP addresses are virtual IP addresses that need ARP
>+         * handling. */
>+        struct smap_node *node;
>+        SMAP_FOR_EACH(node, &op->od->nbr->dnat) {
>+            ovs_be32 ip;
>+            if (!ip_parse(node->key, &ip) || !ip) {
>+                static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1);
>+                VLOG_WARN_RL(&rl, "bad ip address %s in dnat configuration"
>+                             "for router %s", node->key, op->key);
>+                continue;
>+            }
>+
>+            match = xasprintf(
>+                "inport == %s && arp.tpa == "IP_FMT" && arp.op == 1",
>+                op->json_key, IP_ARGS(ip));
>+            actions = xasprintf(
>+                "eth.dst = eth.src; "
>+                "eth.src = "ETH_ADDR_FMT"; "
>+                "arp.op = 2; /* ARP reply */ "
>+                "arp.tha = arp.sha; "
>+                "arp.sha = "ETH_ADDR_FMT"; "
>+                "arp.tpa = arp.spa; "
>+                "arp.spa = "IP_FMT"; "
>+                "outport = %s; "
>+                "inport = \"\"; /* Allow sending out inport. */ "
>+                "output;",
>+                ETH_ADDR_ARGS(op->mac),
>+                ETH_ADDR_ARGS(op->mac),
>+                IP_ARGS(ip),
>+                op->json_key);
>+            ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_INPUT, 90,
>+                          match, actions);
>+            free(match);
>+            free(actions);
>+        }
>+
>         /* Drop IP traffic to this router. */
>         match = xasprintf("ip4.dst == "IP_FMT, IP_ARGS(op->ip));
>         ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_INPUT, 60,
>@@ -1972,6 +2013,123 @@ build_lrouter_flows(struct hmap *datapaths, struct 
>hmap *ports,
>         free(match);
>     }
> 
>+    /* Ingress SNAT. This is for already established connections' reverse

As mentioned earlier, I prefer calling this Ingress Undo SNAT or something 
along those lines.

>+     * traffic. i.e., SNAT has already been done in egress pipeline and now
>+     * the packet has entered the ingress pipeline as part of a reply.
>+     * We undo the SNAT here.
>+     *
>+     * Undoing SNAT has to happen before DNAT processing. This is because when
>+     * the packet was DNATed in ingress pipeline, it did not know about the
>+     * possibility of eventual additional SNAT in egress pipeline. */
>+    HMAP_FOR_EACH (od, key_node, datapaths) {
>+        if (!od->nbr) {
>+            continue;
>+        }
>+
>+        /* Packets are allowed by default. */
>+        ovn_lflow_add(lflows, od, S_ROUTER_IN_SNAT, 0, "1", "next;");
>+
>+        /* SNAT rules are only valid on non-distributed routers. */
>+        if (!smap_get(&od->nbr->options, "chassis")) {
>+            continue;
>+        }
>+
>+        struct smap_node *node;
>+        SMAP_FOR_EACH(node, &od->nbr->snat) {
>+            ovs_be32 ip, mask;
>+
>+            char *error = ip_parse_masked(node->key, &ip, &mask);
>+            if (error) {
>+                static struct vlog_rate_limit rl
>+                        = VLOG_RATE_LIMIT_INIT(5, 1);
>+                VLOG_WARN_RL(&rl, "bad ip network or ip %s for snat",
>+                             node->key);
>+                continue;
>+            }
>+
>+            error = ip_parse_masked(node->value, &ip, &mask);
>+            if (error || mask != OVS_BE32_MAX) {
>+                static struct vlog_rate_limit rl
>+                        = VLOG_RATE_LIMIT_INIT(5, 1);
>+                VLOG_WARN_RL(&rl, "bad ip address %s for snat", node->value);
>+                continue;
>+            }
>+
>+            char *match;
>+            match = xasprintf("ip && ip4.dst == %s", node->value);
>+            ovn_lflow_add(lflows, od, S_ROUTER_IN_SNAT, 100,
>+                          match, "ct_snat; next;");
>+            free(match);
>+        }
>+    }
>+
>+    /* Ingress DNAT. Packets enter the pipeline with destination IP address
>+     * that needs to be DNATted from a virtual IP address to a real
>+     * IP address.  Packets in the reverse direction get unDNATed. */
>+    HMAP_FOR_EACH (od, key_node, datapaths) {
>+        if (!od->nbr) {
>+            continue;
>+        }
>+
>+        /* Packets are allowed by default. */
>+        ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, 0, "1", "next;");
>+
>+        /* DNAT rules are only valid on non-distributed routers. */
>+        if (!smap_get(&od->nbr->options, "chassis")) {
>+            continue;
>+        }
>+
>+        struct smap_node *node;
>+        SMAP_FOR_EACH(node, &od->nbr->dnat) {
>+            char *match, *actions;
>+            ovs_be32 ip, mask;
>+
>+            char *error = ip_parse_masked(node->key, &ip, &mask);
>+            if (error || mask != OVS_BE32_MAX) {
>+                static struct vlog_rate_limit rl
>+                        = VLOG_RATE_LIMIT_INIT(5, 1);
>+                VLOG_WARN_RL(&rl, "bad ip address key %s for dnat", 
>node->key);
>+                continue;
>+            }
>+
>+            error = ip_parse_masked(node->value, &ip, &mask);
>+            if (error || mask != OVS_BE32_MAX) {
>+                static struct vlog_rate_limit rl
>+                        = VLOG_RATE_LIMIT_INIT(5, 1);
>+                VLOG_WARN_RL(&rl, "bad ip address value %s for dnat",
>+                             node->value);
>+                continue;
>+            }
>+
>+            /* Packet when it goes from the initiator to destination.
>+             * We need to zero the inport because the router can
>+             * send the packet back through the same interface. */
>+            match = xasprintf("ip && ip4.dst == %s", node->key);
>+            actions = xasprintf("inport = \"\"; ct_dnat(%s);", node->value);
>+            ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, 100,
>+                          match, actions);
>+            free(match);
>+            free(actions);
>+        }
>+
>+        /* Re-circulate every other packet through the DNAT zone.
>+         * This helps with 2 things.
>+         *
>+         * 1. Any packet that needs to be unDNATed in the reverse
>+         * direction gets unDNATed. Ideally this could be done in
>+         * the egress pipeline. But since the gateway router
>+         * does not have any feature that depends on the source
>+         * ip address being virtual IP address for IP routing,
>+         * we can do it here, saving a future re-circulation.
>+         *
>+         * 2. Any packet that was sent through SNAT zone in the
>+         * previous table automatically gets re-circulated to get
>+         * back the new destination IP address that is needed for
>+         * routing in the openflow pipeline. */
>+        ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, 50,
>+                      "ip", "inport = \"\"; ct_dnat;");
>+    }
>+
>     /* Logical router ingress table 2: IP Routing.
>      *
>      * A packet that arrives at this table is an IP packet that should be
>@@ -2172,7 +2330,53 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap 
>*ports,
>         ovn_lflow_add(lflows, od, S_ROUTER_IN_ARP_REQUEST, 0, "1", "output;");
>     }
> 
>-    /* Logical router egress table 0: Delivery (priority 100).
>+    /* Egress SNAT. Packets enter the pipeline with source ip address
>+     * that needs to be SNATted to a virtual ip address. */
>+    HMAP_FOR_EACH (od, key_node, datapaths) {
>+        if (!od->nbr) {
>+            continue;
>+        }
>+
>+        /* Packets are allowed by default. */
>+        ovn_lflow_add(lflows, od, S_ROUTER_OUT_SNAT, 0, "1", "next;");
>+
>+        /* SNAT rules are only valid on non-distributed routers. */
>+        if (!smap_get(&od->nbr->options, "chassis")) {
>+            continue;
>+        }
>+
>+        struct smap_node *node;
>+        SMAP_FOR_EACH(node, &od->nbr->snat) {
>+            char *match, *actions;
>+            ovs_be32 ip, mask;
>+
>+            char *error = ip_parse_masked(node->key, &ip, &mask);
>+            if (error) {
>+                static struct vlog_rate_limit rl
>+                        = VLOG_RATE_LIMIT_INIT(5, 1);
>+                VLOG_WARN_RL(&rl, "bad ip or ip network %s in snat key",
>+                             node->key);
>+                continue;
>+            }
>+
>+            error = ip_parse_masked(node->value, &ip, &mask);
>+            if (error || mask != OVS_BE32_MAX) {
>+                static struct vlog_rate_limit rl
>+                        = VLOG_RATE_LIMIT_INIT(5, 1);
>+                VLOG_WARN_RL(&rl, "bad ip address %s for snat", node->value);
>+                continue;
>+            }
>+
>+            match = xasprintf("ip && ip4.src == %s", node->key);
>+            actions = xasprintf("ct_snat(%s);", node->value);
>+            ovn_lflow_add(lflows, od, S_ROUTER_OUT_SNAT, 100,

You are allowing the key to be a prefix, with mask other than OVS_BE32_MAX.
This implies that a user can specify multiple overlapping prefixes.
Conflicts should be resolved by going with the longest matching prefix.

We actually need this behavior for OpenStack. If the source IP address is a
fixed IP that has a corresponding floating IP, then the source IP address
should be mapped to the floating IP address rather than the router gateway
address. We need longest match to win to get this behavior.

So, the priority should be (mask+1).

>+                          match, actions);
>+            free(match);
>+            free(actions);
>+        }
>+    }
>+
>+    /* Logical router egress table 1: Delivery (priority 100).
>      *
>      * Priority 100 rules deliver packets to enabled logical ports. */
>     HMAP_FOR_EACH (op, key_node, ports) {
>diff --git a/ovn/ovn-nb.ovsschema b/ovn/ovn-nb.ovsschema
>index fa21b30..4d9c613 100644
>--- a/ovn/ovn-nb.ovsschema
>+++ b/ovn/ovn-nb.ovsschema
>@@ -1,7 +1,7 @@
> {
>         "name": "OVN_Northbound",
>-       "version": "2.1.2",
>-       "cksum": "429668869 5325",
>+       "version": "2.1.3",
>+       "cksum": "400575131 5731",
>         "tables": {
>                 "Logical_Switch": {
>                         "columns": {
>@@ -78,6 +78,14 @@
>                                    "max": "unlimited"}},
>                                 "default_gw": {"type": {"key": "string", 
> "min": 0, "max": 1}},
>                                 "enabled": {"type": {"key": "boolean", "min": 
> 0, "max": 1}},
>+                               "dnat" : {
>+                                       "type": {"key": {"type": "string"},
>+                                                        "value": {"type" : 
>"string"},
>+                                                         "min": 0, "max": 
>"unlimited"}},
>+                               "snat" : {
>+                                       "type": {"key": {"type": "string"},
>+                                                         "value": {"type" : 
>"string"},
>+                                                        "min": 0, "max": 
>"unlimited"}},

As mentioned above, in OpenStack, if the source IP address is a fixed IP
that has a corresponding floating IP, then the source IP address should be
mapped to the floating IP address rather than the router gateway address.

Separate dnat and snat lists as defined here can be made to work to support
the OpenStack behavior using this patch, by copying floating IPs into both
the dnat and snat lists. However this is a little unwieldy.

If we ever move to distributed handling of floating IPs, which I am still
working on, then the duplication will be difficult to handle for east/west
flows. It would be easier if one definition could apply to both dnat and snat.
An additional value could indicate if the rule applies to only dnat or both.
Or, perhaps there could just be a single list of nat rules with an additional
value indicating if the rule applies to snat or dnat or both.

Note also that for distributed handling of floating IPs, we will need MAC
addresses to go along with the floating IPs. This makes me think it might
be worth having an indirection to a separate nat table.

Mickey


>                 "options": {
>                                          "type": {"key": "string",
>                                                           "value": "string",
>diff --git a/ovn/ovn-nb.xml b/ovn/ovn-nb.xml
>index d239499..508a865 100644
>--- a/ovn/ovn-nb.xml
>+++ b/ovn/ovn-nb.xml
>@@ -631,18 +631,41 @@
>       router has all ingress and egress traffic dropped.
>     </column>
> 
>+    <column name="dnat">
>+      A map of externally visible IP address to their corresponding IP
>+      addresses in the logical space.  The externally visible IP address
>+      is DNATed to the IP address in the logical space.  DNAT only works
>+      on routers that are non-distributed.
>+    </column>
>+
>+    <column name="snat">
>+      A map of IP network (e.g 192.168.1.0/24) or an IP address to another
>+      IP address.  All IP packets with their source IP address that either
>+      matches the key or is in the provided network is SNATed
>+      into the IP address in the value.  SNAT only works on routers that are
>+      non-distributed.
>+    </column>
>+
>     <group title="Options">
>       <p>
>         Additional options for the logical router.
>       </p>
> 
>       <column name="options" key="chassis">
>-        If set, indicates that the logical router in question is
>-        non-distributed and resides in the set chassis. The same
>-        value is also used by <code>ovn-controller</code> to
>-        uniquely identify the chassis in the OVN deployment and
>-        comes from <code>external_ids:system-id</code> in the
>-        <code>Open_vSwitch</code> table of Open_vSwitch database.
>+        <p>
>+          If set, indicates that the logical router in question is
>+          non-distributed and resides in the set chassis.  The same
>+          value is also used by <code>ovn-controller</code> to
>+          uniquely identify the chassis in the OVN deployment and
>+          comes from <code>external_ids:system-id</code> in the
>+          <code>Open_vSwitch</code> table of Open_vSwitch database.
>+        </p>
>+
>+        <p>
>+          The non-distributed router (also known as Gateway router)
>+          can only be connected to a distributed router via a switch
>+          if SNAT and DNAT is to be configured in the Gateway router.
>+        </p>
>       </column>
>     </group>
>     
>diff --git a/ovn/ovn-sb.xml b/ovn/ovn-sb.xml
>index 741228c..3a0268f 100644
>--- a/ovn/ovn-sb.xml
>+++ b/ovn/ovn-sb.xml
>@@ -951,6 +951,44 @@
>           </p>
>         </dd>
> 
>+        <dt><code>ct_dnat;</code></dt>
>+        <dt><code>ct_dnat(<var>IP</var>);</code></dt>
>+        <dd>
>+          <p>
>+            <code>ct_dnat</code> sends the packet through the DNAT zone to
>+            unDNAT any packet that was DNATed in a different direction.
>+          </p>
>+          <p>
>+            <code>ct_dnat(<var>IP</var>)</code> sends the packet through the
>+            DNAT zone to change the destination IP address of the packet to
>+            the one provided inside the parenthesis and commits the 
>connection.
>+          </p>
>+          <p>
>+            Both <code>ct_dnat</code> and <code>ct_dnat(<var>IP</var>)</code>
>+            recirculate the packet in the datapath.
>+          </p>
>+        </dd>
>+
>+        <dt><code>ct_snat;</code></dt>
>+        <dt><code>ct_snat(<var>IP</var>);</code></dt>
>+        <dd>
>+          <p>
>+            <code>ct_snat</code> sends the packet through the SNAT zone to
>+            unSNAT any packet that was SNATed in a different direction.
>+          </p>
>+          <p>
>+            <code>ct_snat(<var>IP</var>)</code> sends the packet through the
>+            SNAT zone to change the source IP address of the packet to
>+            the one provided inside the parenthesis and commits the 
>connection.
>+          </p>
>+          <p>
>+            <code>ct_snat</code> does not recirculate the packet in the
>+            datapath and hence should be followed by a <code>next;</code>
>+            action. <code>ct_snat(<var>IP</var>)</code> does
>+            recirculate the packet in the datapath.
>+          </p>
>+        </dd>
>+
>         <dt><code>arp { <var>action</var>; </code>...<code> };</code></dt>
>         <dd>
>           <p>
>-- 
>1.9.1
>
>_______________________________________________
>dev mailing list
>dev@openvswitch.org
>http://openvswitch.org/mailman/listinfo/dev
>

_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to