On Tue, May 10, 2016 at 10:30 AM, Pravin B Shelar <pshe...@ovn.org> wrote: > diff --git a/lib/dp-packet-lso.c b/lib/dp-packet-lso.c > new file mode 100644 > index 0000000..14a5ed8 > --- /dev/null > +++ b/lib/dp-packet-lso.c > +void > +fixup_packet_cheksum(struct dp_packet *pkt, int l4_offset, int csum_offset, > + int new_ip_tot_len, int old_ip_tot_len) > +{ > + ovs_be16 *data_ptr, *csum; > + uint32_t l4_csum; > + > + data_ptr = (ovs_be16 *) ((uint16_t *) dp_packet_data(pkt) + (l4_offset > >> 1)); > + csum = data_ptr + (csum_offset >> 1);
All of the shifts and multiples of two arithmatic make the code hard to read. It would be nice if it just did what it wanted to do directly rather than trying to be too clever. > + l4_csum = csum_continue(0, data_ptr, dp_packet_size(pkt) - l4_offset); > + *csum = csum_finish(l4_csum); > + if (new_ip_tot_len != old_ip_tot_len) { > + *csum = recalc_csum16(*csum, htons(old_ip_tot_len), > htons(new_ip_tot_len)); > + } We definitely need to document what the expectations are as far as seeding checksums with pseudoheaders, etc. I know that it is the same as Linux but that's not going to be obvious to most people. > +static struct dp_packet * > +segment_tcp_packet(struct dp_packet *orig) > +{ [...] > + if (mss) { > + struct tcp_header *tcph = dp_packet_l4(seg); > + > + put_16aligned_be32(&tcph->tcp_seq, htonl(tcp_seq)); > + tcp_seq += mss; > + tcph->tcp_ctl = htons(ntohs(tcph->tcp_ctl) & > + ~(TCP_FIN | TCP_PSH)); These flags should be left alone on the last segment and cleared from the earlier ones. I also don't think this handles ECN. > +static struct dp_packet * > +segment_l4_packet(struct dp_packet *orig) > +{ > + if (orig->lso.type & (DPBUF_LSO_TCPv4 | DPBUF_LSO_TCPv6)) { > + return segment_tcp_packet(orig); > + } else if (orig->lso.type & (DPBUF_LSO_UDPv4 | DPBUF_LSO_UDPv6)) { > + return segment_udp_packet(orig); > + } > + OVS_NOT_REACHED(); I'm really nervous about asserting that we won't have any other protocol type here. I think the best thing to do would be log an error and drop the packet. > +static struct dp_packet * > +segment_ipv4_packet(struct dp_packet *orig) > +{ [...] > + orig->l4_ofs = orig->l3_ofs + IP_HEADER_LEN; What about IP options? > +static void > +update_ipv6_frag_hdr(struct dp_packet *pkt, int *ipv6_frag_offset) > +{ [...] I don't really understand the idea behind this function. The implication seems to be that there should always be a preexisting fragment header but I'm not sure why that would be the case in most situations. In any case, it would also be nice to somehow consolidate the code with the function below it. > +static int > +ipv6_set_l4_offset(struct dp_packet *pkt) > +{ [...] > + } else if (nw_proto == IPPROTO_AH) { > + const struct ip6_ext *ext_hdr = data; > + nw_proto = ext_hdr->ip6e_nxt; > + > + offset += (ext_hdr->ip6e_len + 2) * 4; > + if (offset > size) { > + goto out; > + } I guess, realistically speaking, nothing good is going to come of trying to checksum/segment a packet that already has an AH header on it. > +static struct dp_packet * > +segment_eth_packet(struct dp_packet *orig, int offset) > +{ This patch series has a bunch of things which are added ahead of when they are actually used. 'offset' is one example of this where it is not necessary in the current usage at this point but there were others in previous patches as well. This makes it difficult to review since you have to flip between different patches to understand why something is being done. > + dp_packet_reset_packet(orig, offset); What are the semantics around modifying the original packet? > + eth = dp_packet_data(orig); > + eth_type = eth->eth_type; > + if (eth_type_vlan(eth->eth_type)) { > + struct vlan_eth_header *vethh = (struct vlan_eth_header *) > dp_packet_data(orig); > + > + eth_type = vethh->veth_next_type; > + header_len += VLAN_HEADER_LEN; > + } Is a single VLAN tag reasonable limit? I guess in this series the only possible source of offloading is STT and that has a maximum of one but what about other sources? > + if (eth_type == htons(ETH_TYPE_IP)) { > + seg_list = segment_ipv4_packet(orig); > + } else if (eth_type == htons(ETH_TYPE_IPV6)) { > + seg_list = segment_ipv6_packet(orig); > + } else { > + return NULL; > + } If we return NULL when a protocol is not supported, it causes FOR_EACH_LSO_SEG to just be a no-op rather than signaling any kind of error. I think in at least some places we should check for this as it indicates a bug, either here or in the caller. > diff --git a/lib/dp-packet-lso.h b/lib/dp-packet-lso.h > new file mode 100644 > index 0000000..09815e8 > --- /dev/null > +++ b/lib/dp-packet-lso.h > +#define DPBUF_LSO_TCPv4 (1 << 0) > +#define DPBUF_LSO_TCPv6 (1 << 1) > +#define DPBUF_LSO_UDPv4 (1 << 2) > +#define DPBUF_LSO_UDPv6 (1 << 3) I find myself wondering if this (meaning the overall approach to offloading) is going to scale well in the future. It seems like it would be better off it was somehow more modular. An obvious example is protocols - it would be nice if they were pluggable rather than needing to be added to a big chain of if statements. (Also, it would seem to be good to break apart L3 vs. L4 types in the above #defines.) Another part is the combining of checksum offload with LSO - it works for now but will it make sense in the future? Can it integrate with hardware checksum offloading state later on or will that be split between OVS and DPDK data structures? > +struct dp_packet_lso_ctx { > + struct dp_packet *next; /* Used to list lso segments. */ > +}; > + > +BUILD_ASSERT_DECL(DP_PACKET_CONTEXT_SIZE >= sizeof(struct > dp_packet_lso_ctx)); Are there any places where dp_packet_lso_ctx is used at the same time as pkt_metadata? For example, when a large packet is decapsulated by STT and has some metadata? I guess that in the existing cases where we have upcalls, the flow has already been extracted and we can get the metadata from there. However, it seems somewhat risky and the semantics of what is safe are not clearly defined. > +#define PACKET_LSO_CTX(packet) ((struct dp_packet_lso_ctx *)(packet)->data) I know that 'data' is defined in an earlier patch but looking at it here, it seems like it would be good to make the name more descriptive - it looks like it is referring to packet payload data. > +void > +fixup_packet_cheksum(struct dp_packet *pkt, int l4_offset, int csum_offset, > + int new_ip_tot_len, int old_ip_tot_len); > + "checksum" - I think there are a few instances of this typo in other places as well. In addition, I don't think that it is used outside of dp-packet-lso.c, so it probably could be made static. > @@ -567,12 +571,14 @@ enum { NETDEV_MAX_BURST = 32 }; /* Maximum number > packets in a batch. */ > > struct dp_packet_batch { > int count; > + uint8_t lso_type; > struct dp_packet *packets[NETDEV_MAX_BURST]; > }; I think this could probably at least use a comment - it's not immediately clear that this is the combination of all LSO types in the batch. > diff --git a/lib/netdev.c b/lib/netdev.c > index a83e53e..c3fdb44 100644 > --- a/lib/netdev.c > +++ b/lib/netdev.c > +static int > +send_packet__(struct netdev *netdev, int qid, struct dp_packet_batch *b, > + struct dp_packet *pkt, bool may_steal) > +{ It looks like the only two callers of this function both set 'may_steal' to true unconditionally, so we might just be able to assume that and remove the parameter. > +static int > +netdev_send_lso(struct netdev *netdev, int qid, struct dp_packet_batch *s, > + bool may_steal) [...] > + error = 0; > + FOR_EACH_LSO_SEG_SAFE(seg_list, seg, next) { > + if (OVS_UNLIKELY(error)) { > + dp_packet_delete(seg); > + continue; > + } > + error = send_packet__(netdev, qid, &b, seg, true); > + if (OVS_UNLIKELY(error)) { > + dp_packet_delete(seg); > + } > + } Is the goal of the error checks in this loop to free all remaining segments once there is an error in one? I'm not sure that's really necessary - if we sent the remaining non-error segments then potentially something like SACK in the receiver would be able to pick up just the missing one later on. > + if (b.count) { > + error = netdev->netdev_class->send(netdev, qid, > + b.packets, b.count, true); > + if (!error) { > + dp_packet_batch_init(&b); I don't know if we really need to clear the batch here after we've successfully sent it since we return from the function and it goes out of scope just a few lines later. > +err: > + if (may_steal) { > + for (i = i + 1; i < s->count; i++) { > + dp_packet_delete(s->packets[i]); > + } > + } > + > + dp_packet_delete_batch(&b, true); According to the documentation for netdev send functions, ownership of the batch is taken regardless of whether there is an error as long as 'may_steal' is true, so I think we'll have a double free as a result of deleting this batch. > netdev_send(struct netdev *netdev, int qid, struct dp_packet_batch *batch, > bool may_steal) > { > - if (!netdev->netdev_class->send) { > - dp_packet_delete_batch(batch, may_steal); > - return EOPNOTSUPP; > - } > + int error; > > - int error = netdev->netdev_class->send(netdev, qid, > + if (!netdev->netdev_class->send) { > + error = EOPNOTSUPP; > + } else if (batch->lso_type & ~netdev->supported_lso_types) { > + return netdev_send_lso(netdev, qid, batch, may_steal); > + } else { > + error = netdev->netdev_class->send(netdev, qid, > batch->packets, batch->count, > may_steal); > + } > + > if (!error) { > COVERAGE_INC(netdev_sent); > + } else { > + goto err; > } > + return 0; > + > +err: > + dp_packet_delete_batch(batch, may_steal); > return error; > } I think we have a similar issue with double free here (and the original code didn't free the packets on error). > @@ -773,6 +867,10 @@ netdev_push_header(const struct netdev *netdev, > return -EINVAL; > } > > + if (batch->lso_type & ~netdev->supported_lso_types) { > + return -EINVAL; > + } Shouldn't this be treated basically the same as sending and we should do segmentation if the type isn't supported? Otherwise, we basically require tunnels to support all types of LSO even just for correct functionality. _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev