On Mon, Mar 23, 2015 at 12:23 PM, Jesse Gross <je...@nicira.com> wrote: > On Mon, Mar 9, 2015 at 3:12 PM, Pravin B Shelar <pshe...@nicira.com> wrote: >> openvswitch_headers = \ >> diff --git a/datapath/linux/.gitignore b/datapath/linux/.gitignore >> index 69d6658..1d1aae7 100644 >> --- a/datapath/linux/.gitignore >> +++ b/datapath/linux/.gitignore >> @@ -50,6 +50,7 @@ >> /vport-lisp.c >> /vport-netdev.c >> /vport-patch.c >> +/vport-stt.c >> /vport-vxlan.c >> /vport.c >> /vxlan.c > > Should we add stt.c as well? > ok.
>> diff --git a/datapath/linux/compat/stt.c b/datapath/linux/compat/stt.c >> new file mode 100644 >> index 0000000..4cb4ea6 >> --- /dev/null >> +++ b/datapath/linux/compat/stt.c >> +static int skb_list_linearize(struct sk_buff *head, gfp_t gfp_mask) >> +{ >> + struct sk_buff *skb; >> + int tlen = 0; >> + int err; >> + >> + err = skb_linearize(head); >> + if (err) >> + return err; >> + >> + skb = head->next; >> + while (skb) { >> + tlen += skb->len; >> + skb = skb->next; >> + } >> + err = pskb_expand_head(head, 0, tlen, gfp_mask); >> + if (err) >> + return err; > > Were you going to try to avoid linearizing and then expanding as Alex > had mentioned in his comments? > ok. >> +static int __build_segments(struct sk_buff **headp, bool ipv4, int >> l4_offset) >> +{ >> + struct sk_buff *head = *headp; >> + struct sk_buff *nskb = NULL; >> + struct sk_buff *rskb, *skb; >> + struct tcphdr *tcph; >> + int seg_len = 0; >> + int hdr_len; >> + int tcp_len; >> + u32 seq; >> + >> + /* GRO can produce skbs with only the headers, which we've >> + * already pulled off. We can just dump them. >> + */ >> + while (head->len == 0) { >> + nskb = head->next; >> + copy_skb_metadata(nskb, head); >> + consume_skb(head); >> + head = nskb; >> + } >> + *headp = head; > >> + tcph = (struct tcphdr *)(head->data + l4_offset); >> + tcp_len = tcph->doff * 4; >> + hdr_len = l4_offset + tcp_len; > > I think there is something wrong with the ordering of the calls in > stt_rcv(). reassemble() will just produce a linked list of skbs that > are expected to be joined here but we've already called functions that > do pskb_may_pull(), which won't traverse that list. In particular, it > doesn't make sense to both remove zero length skbs above and expect > that there is a TCP header available below. > I can remove skb zero len check that is not required. skb should always have TCP header since reassemble() used it to build the list. >> + >> + if (unlikely((tcp_len < sizeof(struct tcphdr)) || >> + (head->len < hdr_len))) >> + return -EINVAL; >> + >> + if (unlikely(!pskb_may_pull(head, hdr_len))) >> + return -ENOMEM; > > It seems risky to me to combine skb coalescing with TCP header > updating as we would need to very carefully check boundary conditions. > I feel like there are two halves of the loop anyways, so it seems > better to just break them apart. > Do you mean convert it into two separate loops? >> +static int __segment_skb(struct sk_buff **headp, bool ipv4, bool tcp, bool >> csum_partial, int l4_offset) >> +{ >> + int err; >> + >> + err = straighten_frag_list(headp); >> + if (unlikely(err)) >> + return err; >> + >> + if (__linearize(*headp, ipv4, tcp, csum_partial)) { >> + return skb_list_linearize(*headp, GFP_ATOMIC); >> + } >> + >> + if ((*headp)->next) { > > I think we can push this check up above linearization. If there is no > list of skbs, we don't need to linearize either. > ok. >> +static int segment_skb(struct sk_buff **headp) >> +{ > > Isn't there a check for frag_list for the receive path missing > somewhere? It seems like linearization happens somewhat > unconditionally. > I added check for frag_list. >> +static int skb_list_xmit(struct rtable *rt, struct sk_buff *skb, __be32 src, >> + __be32 dst, __u8 tos, __u8 ttl, __be16 df) >> +{ >> + int len = 0; >> + >> + while (skb) { >> + struct sk_buff *next = skb->next; >> + >> + if (next) >> + dst_clone(&rt->dst); >> + >> + skb->next = NULL; >> + len += iptunnel_xmit(NULL, rt, skb, src, dst, IPPROTO_TCP, >> + tos, ttl, df, false); > > I think there may be a problem in our version of ip_local_out() if it > thinks that fix_segment is set in the CB, so we need to find a way to > make sure that it is cleared. > right, I will fix STT and I will send another patch for other vport types. >> +int stt_xmit_skb(struct sk_buff *skb, struct rtable *rt, >> + __be32 src, __be32 dst, __u8 tos, >> + __u8 ttl, __be16 df, __be16 src_port, __be16 dst_port, >> + __be64 tun_id) > [...] >> + while (skb) { >> + struct sk_buff *next_skb = skb->next; >> + >> + skb->next = NULL; >> + >> + if (next_skb) >> + dst_clone(&rt->dst); >> + >> + /* Push STT and TCP header. */ >> + skb = push_stt_header(skb, tun_id, src_port, dst_port, src, >> + dst, inner_h_proto, inner_nw_proto, >> + dst_mtu(&rt->dst)); >> + if (unlikely(!skb)) >> + goto next; > > I think it's somewhat surprising for a function called > push_stt_header() to free the skb in the event of an error. I actually > think that there is a double free already between __push_stt_header() > and push_stt_header() so it seems like it is better to just propagate > the error back here and then free. > I have to free skb in push_stt header since it can change skb, so I will fix the double free bug. >> +static bool set_offloads(struct sk_buff *skb) >> +{ > [...] >> + if (proto_type == (STT_PROTO_IPV4 | STT_PROTO_TCP)) { > > Should we convert this to a switch statement? It seems mildly cleaner. > ok. >> +void stt_sock_release(struct net *net, __be16 port) > > Is there a reason to not just pass in stt_sock here? It seems cleaner > and more consistent. > I need to take the mutex to do lookup so I can not do it from vport-stt.c. >> + struct stt_sock *stt_sock; >> + >> + mutex_lock(&stt_mutex); >> + rcu_read_lock(); >> + stt_sock = stt_find_sock(net, port); >> + rcu_read_unlock(); > > If we can't pass in stt_sock then I don't think we need rcu_read_lock > since we're holding the mutex. > It is just to keep RCU checker happy. >> diff --git a/datapath/vport-stt.c b/datapath/vport-stt.c >> new file mode 100644 >> index 0000000..ccd4c71 >> --- /dev/null >> +++ b/datapath/vport-stt.c >> +#if LINUX_VERSION_CODE >= KERNEL_VERSION(3,5,0) > > I think we need a check for CONFIG_NETFILTER someplace, otherwise the > registration mechanism won't compile. I wonder if both of these checks > should be pushed down to stt.c though since that's where they actually > matter. > ok, I will add stt stub function to handle case without netfilter. >> diff --git a/datapath/vport.h b/datapath/vport.h >> index e256fc0..50e6289 100644 >> --- a/datapath/vport.h >> +++ b/datapath/vport.h >> @@ -35,6 +35,7 @@ struct vport_parms; >> struct vport_net { >> struct vport __rcu *gre_vport; >> struct vport __rcu *gre64_vport; >> + struct vport __rcu *stt_vport; > > I think this is no longer used now that the port number is configurable. > ok. >> diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c >> index 8e1b542..bd7f146 100644 >> --- a/lib/netdev-vport.c >> +++ b/lib/netdev-vport.c >> @@ -153,7 +154,7 @@ netdev_vport_needs_dst_port(const struct netdev *dev) >> >> return (class->get_config == get_tunnel_config && >> (!strcmp("geneve", type) || !strcmp("vxlan", type) || >> - !strcmp("lisp", type))); >> + !strcmp("lisp", type) || !strcmp("stt", type)) ); >> } > > I think we need some additional support for configurable STT ports in > netdev_vport_construct() and get_tunnel_config(). > ok. >> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml >> index dac3756..1b9da1d 100644 >> --- a/vswitchd/vswitch.xml >> +++ b/vswitchd/vswitch.xml >> + <dt><code>stt</code></dt> >> + <dd> >> + The Stateless TCP Tunnel (STT) protocol encapsulates traffic in >> + IPv4/TCP packets. All traffic uses a destination port of 7471. >> + The STT protocol does not engage in the usual TCP 3-way >> handshake, >> + so it will have difficulty traversing stateful firewalls. >> + STT is only available in kernel datapath on kernel 3.5 or >> newer. >> + </dd> > > Can you make this a little more descriptive? I think right now it is > hard to understand when this should or should not be used. ok. _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev