On Fri, Oct 28, 2011 at 11:29 AM, Ben Pfaff <[email protected]> wrote:
> On Thu, Oct 27, 2011 at 04:38:39PM -0700, Jesse Gross wrote:
>> On Wed, Oct 26, 2011 at 3:27 PM, Ben Pfaff <[email protected]> wrote:
>> > diff --git a/datapath/tunnel.c b/datapath/tunnel.c
>> > index 372d90e..047961f 100644
>> > --- a/datapath/tunnel.c
>> > +++ b/datapath/tunnel.c
>> > +static int tnl_sock_bind(const struct tnl_ops *tnl_ops, struct
>> > tnl_mutable_config *mutable)
>> > +{
>> > +#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,26)
>> > + ?? ?? ?? struct sockaddr_in sin;
>> > + ?? ?? ?? struct tnl_sock *ts;
>> > + ?? ?? ?? int err;
>> > +
>> > + ?? ?? ?? if (tnl_ops->ipproto != IPPROTO_UDP)
>> > + ?? ?? ?? ?? ?? ?? ?? return 0;
>> > +
>> > + ?? ?? ?? list_for_each_entry (ts, &tnl_socks, list) {
>> > + ?? ?? ?? ?? ?? ?? ?? if (ts->port == mutable->dst_port) {
>> > + ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? if (udp_sk(ts->socket->sk)->encap_rcv
>> > != tnl_ops->udp_rcv)
>> > + ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? return -EADDRINUSE;
>>
>> When I originally proposed this, I thought that we could directly use
>> the socket lookup to find the tunnel. I now see that it isn't
>> possible because of the key (and there is no way to know whether the
>> key is important until after you do the tunnel lookup). Is there any
>> benefit to binding the socket more specifically to the tunnel using
>> the remote and local (if there is one) IPs? The main benefit of this
>> would be to enable finer grained binding of sockets for different
>> tunnel types. I don't know whether that is more or less confusing
>> though. It also makes the UDP socket lookup marginally faster.
>
> If I do that I think I'll need a hash table for the tnl_socks since
> there's likely to be many of them (one per local/remote IP pair) if
> someone configures a bunch of tunnels instead of just one (for a given
> port).
>
> I'm willing to do it, your paragraph doesn't make me sure whether you
> really want it though.
My gut feeling is that it's probably the right thing to do (and
actually follow through to use the socket for tunnel lookup, modulo
the key; in the keyless case we don't even have to anything else). It
doesn't excite me that much because it results in parallel code paths
for UDP based protocols vs. everything else. However, I think for
upstreaming where we only have VXLAN it will look a lot more correct
if we do it this way and it should perform slightly better as well.
>> > @@ -1347,29 +1438,57 @@ static int tnl_set_config(struct nlattr *options,
>> > const struct tnl_ops *tnl_ops,
>> [...]
>> > + ?? ?? ?? if (tnl_ops->ipproto == IPPROTO_UDP) {
>> > + ?? ?? ?? ?? ?? ?? ?? struct nlattr *src_base =
>> > a[OVS_TUNNEL_ATTR_SRC_PORT_BASE];
>> > + ?? ?? ?? ?? ?? ?? ?? struct nlattr *src_hash =
>> > a[OVS_TUNNEL_ATTR_SRC_PORT_HASH];
>> > + ?? ?? ?? ?? ?? ?? ?? struct nlattr *dst = a[OVS_TUNNEL_ATTR_DST_PORT];
>> > +
>> > + ?? ?? ?? ?? ?? ?? ?? if (!src_base || !dst) {
>> > + ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? err = -EINVAL;
>> > + ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? goto error;
>> > + ?? ?? ?? ?? ?? ?? ?? }
>>
>> I think we should add some validation to check that if ports are
>> specified then they go with a protocol that has L4 ports. Right now,
>> I don't think either userspace or kernel will care.
>
> OK, fixed in kernel:
Looks good.
> Userspace already should disallow specifying src_port and dst_port on
> non-UDP tunnels because the initial values for those will be 0 for
> such tunnels and there's a check for zero in the parser:
Ah, right.
>> > + ?? ?? ?? ?? ?? ?? ?? mutable->src_port_base =
>> > ntohs(nla_get_be16(src_base));
>> > + ?? ?? ?? ?? ?? ?? ?? mutable->dst_port = nla_get_be16(dst);
>>
>> Can you add a comment to explain that it is intention that one is byte
>> swapped and the other is not? When I first saw it, I was sure it was
>> a bug (although sparse should be good about catching actual bugs).
>
> OK, like this?
>
> Maybe I should make both __be16 anyway, it's less surprising and
> ntohs() isn't very expensive. Opinion?
Another option is to make them both u16. Dest port is set once at
header cache time, so the byte swap has basically no cost at all. I
think just the comment is fine too though.
_______________________________________________
dev mailing list
[email protected]
http://openvswitch.org/mailman/listinfo/dev