On Tue, Oct 11, 2011 at 4:33 PM, Ben Pfaff <[email protected]> wrote:
> diff --git a/datapath/flow.c b/datapath/flow.c
> index 7322295..9a9b0aa 100644
> --- a/datapath/flow.c
> +++ b/datapath/flow.c
> + * Correct behavior when there's more than one fragment header is anybody's
> + * guess. This version reports whether the final fragment header is a first
> + * fragment.
I thought about this a bit more and I think what you have here is correct.
According to RFC 2460, this type of parsing is illegal because you're
supposed to fully process each extension header as a protocol layer
before moving onto the next instead of skipping through them. This
means that end hosts (which are all that should matter here since
routers do not do fragmentation and reassembly) will actually do
iterative reassembly in the case of multiple fragment headers.
There are three interesting cases here (beyond the basic ones):
* A series of first fragments followed by a terminal protocol: This
is a fragmented packet but we have the later protocol information.
It's possible to drop bad packets here and prevent the end host from
successfully reassembling them. Same meaning as OVS_FRAG_TYPE_FIRST.
* A series of first fragments followed by a later fragment: It is a
fragment but we don't have the data to actually parse it. Same as
OVS_FRAG_TYPE_LAST.
* A series of first fragments until the end of the packet: In theory
this actually could be a valid packet and it's possible to achieve a
similar effect with a huge number of extension headers. In either
case we mark it by setting ip_proto to NEXTHDR_NONE. We also mark it
with OVS_FRAG_TYPE_FIRST since it is a first fragment, though in
practice it is roughly equivalent to OVS_FRAG_TYPE_LATER for security
processing.
Basically, packets with multiple fragment extension headers are the
same as an IPv4 packet that has been refragmented and so should be
treated the same. The primary difference is that the IPv6 case should
not occur. We could stop processing on that principle however we tend
to be pretty liberal in our handling of errors and only stop on things
that actually impede our ability to continue parsing. There are
plenty of other things that would be considered errors that we don't
even look at.
> + * Returns the final payload offset, or -1 on error.
> + */
> +static int skip_exthdr(const struct sk_buff *skb, int start, u8 *nexthdrp,
> + u8 *tos_frag)
[...]
> + if (ntohs(*fp) & ~0x7) {
> + *tos_frag |= OVS_FRAG_TYPE_FIRST;
> + break;
> + }
> + *tos_frag |= OVS_FRAG_TYPE_LATER;
Aren't these two cases (FIRST and LATER) reversed?
> @@ -140,11 +202,11 @@ static int parse_ipv6hdr(struct sk_buff *skb, struct
> sw_flow_key *key,
> payload_ofs = (u8 *)(nh + 1) - skb->data;
>
> key->ip.proto = NEXTHDR_NONE;
> - key->ip.tos = ipv6_get_dsfield(nh) & ~INET_ECN_MASK;
> + key->ip.tos_frag = ipv6_get_dsfield(nh) & ~INET_ECN_MASK;
> ipv6_addr_copy(&key->ipv6.addr.src, &nh->saddr);
> ipv6_addr_copy(&key->ipv6.addr.dst, &nh->daddr);
>
> - payload_ofs = ipv6_skip_exthdr(skb, payload_ofs, &nexthdr);
> + payload_ofs = skip_exthdr(skb, payload_ofs, &nexthdr,
> &key->ip.tos_frag);
I think we're missing the UDP GSO case here to mark it as OVS_FRAG_TYPE_FIRST.
> diff --git a/datapath/flow.h b/datapath/flow.h
> index ae12fe4..31a02fa 100644
> --- a/datapath/flow.h
> +++ b/datapath/flow.h
> @@ -29,6 +29,10 @@ struct sw_flow_actions {
> struct nlattr actions[];
> };
>
> +/* Mask for the OVS_FRAG_TYPE_* value in the low 2 bits of ip.tos_frag in
> + * struct sw_flow_key. */
> +#define OVS_FRAG_TYPE_MASK 3
What if we defined this in terms of INET_ECN_MASK? That seems a little clearer.
> diff --git a/lib/nx-match.c b/lib/nx-match.c
> index beaed3d..49aaef9 100644
> --- a/lib/nx-match.c
> +++ b/lib/nx-match.c
> +static void
> +nxm_put_tos_frag(struct ofpbuf *b, const struct cls_rule *cr)
> +{
> + uint8_t tos_frag = cr->flow.tos_frag;
> + uint8_t tos_frag_mask = cr->wc.tos_frag_mask;
> +
> + if (tos_frag_mask & IP_DSCP_MASK) {
> + nxm_put_8(b, NXM_OF_IP_TOS, tos_frag & IP_DSCP_MASK);
> + }
> + if (tos_frag_mask & FLOW_FRAG_MASK) {
> + uint8_t value, mask;
> +
> + value = mask = 0;
> + if (tos_frag_mask & FLOW_FRAG_ANY) {
> + mask |= 1;
> + if (tos_frag & FLOW_FRAG_ANY) {
> + value |= 1;
> + }
> + }
> + if (tos_frag_mask & FLOW_FRAG_FIRST) {
> + mask |= 2;
> + if (tos_frag & FLOW_FRAG_FIRST) {
> + value |= 2;
> + }
> + }
> +
> + if (mask == 3) {
> + mask = UINT8_MAX;
> + }
I'm not sure why we use explicit values here instead of symbolic
constants. I believe that they have same value and we use the
constants on the parse side.
One general thing that occurs to me is that the not_first option is
not particularly useful and that a not fragmented/first fragment
combination would be vastly more interesting (by flipping the first
bit to be later). I'm not sure how much it matters though because you
can get pretty much the same information by looking at the L4 fields
and exactly the same thing by using multiple flows.
Otherwise, I don't really have any other comments on the userspace
portions. I did look through all of it, although I'm not sure how
effectively especially in the meta flow related portions. Hopefully
most of it would be caught by the compiler though the build assertions
and name changes though.
_______________________________________________
dev mailing list
[email protected]
http://openvswitch.org/mailman/listinfo/dev