[clearview-discuss] Clearview IP Tunneling code-review [due August 12]

Sebastien Roy Mon, 10 Aug 2009 10:27:12 -0400

Sowmini,

Thanks very much for reviewing this.  Responses inline:

On Sun, 2009-08-09 at 18:21 -0400, Sowmini.Varadhan at Sun.COM wrote:
> I did not review the libdlpi files- hopefully someone else is looking
> at these- please let me know if I should look at these as well.

Girish also signed up to review libraries, if he doesn't review libdlpi,
then I'll go looking. :-)

> usr/src/cmd/cmd-inet/usr.sbin/Makefile
> - ndd can also be added to DLADM_PROG. And then we can remove line 154 from
>   the file. 

ACCEPT: I hadn't noticed that, thanks.

> usr/src/cmd/cmd-inet/usr.sbin/ifconfig/ifconfig.c
> - "dladm_link_open()" is a confusing name for the function (sounds like
>   something from dladm_open()". Suggest ifconfig_dl_open() or something
>   similar to show that it is local to ifconfig.

ACCEPT: ifconfig_dl_open() sounds fine.

> - Instead of doing pairs of dladm_open/dladm_close, suggest having
>   a global dladm_handle that is opened once in main and closed on exit.
>   dladm_link_open() can use that handle, and it can be reused for 
>   "ifconfig -a <..>" commands like "ifconfig -a ether" and "ifconfig -a 
> plumb". 

REJECT: I'd rather not; ifconfig currently doesn't depend on having the
ability to open a dladm handle except for some commands, and moving this
logic to a common location in the code introduces the risk that we'd be
causing ifconfig to fail unnecessarily (like from within shared stack
zones for example).

> 
> - string passed to str_exit() needs to be gettext'ed?

ACCEPT

>   
> - missing \n after line 4185

ACCEPT

> - Question at line 3915: would you be able to open a datalink if it has
>   been assigned to a NGZ (in the excl zones case)?

Yes.  dladm_link_open() just gets a libdladm handle and then obtains the
linkid for the given link name using dladm_name2info().  Links assigned
to non-global zones are still globally visible, they simply have a
different value for their "zone" link property.

> usr/src/cmd/dladm/dladm.c
> - line 360: turns out the corrrect spelling is "parsable", but we've
>   documented ourselves into having to support both.

ACCEPT: Doh!  I'll add an alternate long option then.  How silly.

> 
> - missing \n after line 986, 3666, 3685

ACCEPT

> 
> - line 991, 992: shouldn't the field width really be INET6_ADDRSTRLEN?

That's the maximum width of an IPv6 literal IP address.  These can be
hostnames as well (which is why the iptun_src and iptun_dst at 983,984
are NI_MAXHOST ane not INET6_ADDRSTRLEN).  In addition to that, if we
made the width of the field in the output be the maximum possible width
of these values, the output would be illegible.

> 
> usr/src/cmd/dladm/dladm.xcl
> - missing up-iptun, down-iptun, tunnel-type, tunnel-src, tunnel-dst, ipv4,
>   ipv6, DESTINATION and several other strings. 

ACCEPT

> 
> usr/src/lib/libdladm/common/linkprop.c
> - line 2063: good catch :-)

;-)

> 
> usr/src/lib/libinetutil/common/ifspec.c
> - line 116 not your change, but shouldn't this be  more efficently done as
>    iflen = strnlen(ifnam, LIFNAMSIZ);

ACCEPT: Sure, I'll fix it while I'm in there.

> 
> usr/src/lib/libldadm/common/libdliptun.c
> - lines 564-566: does it really matter? For all we know, the "old_params"
>   may not be valid any more either, if, for example, the
>   iptun_param_src has been marked down. So how terrible is it to just leave
>   the db with the new params?

This is a really good question, and one that needs to be applied to the
whole of libdladm.  What are the error semantics when a modification to
both the active and persistent configuration fails?  IMO, if the
modification was a single operation, and that operation fails, we
shouldn't leave the modification half-applied, as there is no way to
notify the caller that half of the operation failed, and the other half
succeeded (and which half?).  I believe there is similar logic
throughout the library to make an attempt to rollback changes if an
operation has failed.  If we feel that this is wrong, then a fix should
be uniformly applied.  IMO, it's not wrong, but the method of rollback
is a hack and not guaranteed to work.  That part likely needs to be
fixed, but that's not this project.

> 
> - Question: line 605: why is the explicit dladm_init_linkprop needed
>   (as opposed to just having the dls layer suck down the properties)?

ACCEPT: This same code exists in i_dladm_aggr_up().  I assumed that it
was necessary in order for the properties to get applied.  What should I
do instead?  Do the properties automatically get applied if I don't call
this function?

> 
> usr/src//uts/common/inet/iptun/iptun.c
> - line 1535: suggest bzero(itk, sizeof (*itk)) to future-proof against
>   changest to the type of itk. 

ACCEPT

> 
> usr/src/uts/common/inet/ip/ip_ndp.c
> - do you need symmetric changes to ndp_lookup_then_add_v4 that also add
>   the new ipv4 nce with hwaddr == ill_dest_addr for tunnel interfaces?

No.  The IPv4 code is totally different.  There is no hardware address
passed in to ndp_lookup_then_add_v4(), but rather a "src_nce" which
comes by way of a long trip through a giant call-chain that originates
in a call to ire_create() in ip_newroute() (I pray that
datapath-refactoring simplifies this.)  For IRE_IF_NORESOLVER, a NULL
src_nce is passed in, and the NULL src_nce gets passed in to
ndp_add_v4() which does the right thing given the ill_resolver_mp.  I'd
have to consult with a Surya project team member to get the details on
why they didn't make the IPv4 and IPv6 code more uniform. ;-)

>  
> usr/src/uts/common/inet/ip/ipclassifier.c
> - line 311: 389 is an odd number (no pun intended!) - how did you
>   come up with that default? Adding some comments about how this
>   default was chose would be useful. (Esp since this choice also impacts
>   whether or not % needs to be used in IPCL_IPTUN_HASH)

ACCEPT: It's just a sensibly sized prime number for hashing distribution
voodoo.  I can add a comment.

> 
> usr/src/uts/common/inet/iptun/iptun.c
> - why not just do a msgpullup at line 2043.

IPsec re-inserts the outer IP header as a single MBLK on input, and
since that IP header will simply be removed by GLDv3 when the packet
gets passed up, doing a msgpullup() is a needless copy of data that will
just get immediately discarded anyway.

> 
> - General question about how all this works.. it's possible that I'm 
>   missing something obvious here, so I'm looking for clarification-
> 
>   if I set up an interface to be an ipv4 tunnel interface using
>   dladm,

Let's get the terminology correct so that we agree on what we're talking
about.  You setup an IPv4 tunnel "link" using dladm.

>  then I would have guessed that, when I send an ip packet "mp" down to
>   that tunnel driver, then iptun_ouput() would slap on the appropriate
>   IP "link" headers for the tunnel and send the packet down, using 
>   some possibly cached route (i.e., ire) info to find the
>   nexthop/interface to use.

That's not how GLDv3 output works.  The packet comes down with the
link-layer header already on it (where link-layer header is the outer IP
header in this case).  For example, you don't see Ethernet GLDv3 drivers
slapping on Ethernet headers on output.  That's done by GLDv3.

On the last part of your statement, yes, iptun_output() sends the IP
packet down to the ip module, and the ip module sends it out like any
other IP packet to send out, by doing a forwarding table lookup etc...

> 
>   this should be done regardless of whether mp is IPv4 or IPv6.

Yes.

> 
>   So I was thrown off by the way in which (e.g.,) iptun_out_process_ipv4
>   grovels  in the packet to determine stuff. e.g., recursive-ness in tunnels
>   would be easier to detect and forbid when the tunnel is constructed
>   and marked RUNNING, right?

We could detect a subset of misconfigurations when an IP interface is
created over a tunnel, yes.  We wouldn't be able to prevent problematic
cases that result from bogus routing through a tunnel interface, though.
Having said that, the rest of your comments make sense, more below.

>  And if we consider that a tunnel tun1
>   with outer header (A -> B)  and inner header (C -> B) is recursive [#] 
>   then what about a tunnel which has outer/inner header 
>   (A -> B) / (C -> D) where the route for B points at a tunnel tun2
>   that sets up (C1 -> D) as the outer header?

You mean a packet that has those headers, and not a tunnel, right?  A
tunnel doesn't have a fixed inner header.  Packets get forwarded through
tunnel interfaces that have no relationship to the IP addresses assigned
to the interfaces plumbed over the tunnel link.

Such a packet would indeed be a problem, but the packet itself isn't the
problem.  The configuration itself is bogus.  If we have two tunnels, it
is totally senseless to have each tunnel destinations be only reachable
through the other tunnel.  That's what you have setup here, and I don't
see how the iptun module could possibly detect this on its own.  (I'm
getting to the point where I agree with you, keep reading) ;-)

> 
>   Evenutually the recursion will be caught by one of the tunnels
>   spilling over the encaps limit, so the test in iptun_out_process*
>   is actually needless, right?

Eventually, we'll hit either the encapsulation limit or the MTU limit
and the packet will get dropped, so the check isn't strictly necessary.
The check was there in tun, and I carried it forward under the
assumption that the original author knew what he was doing.  It's not
hard to imagine that letting such packets bounce around between ip and
tun (or iptun in this case) while they get bigger and bigger until they
eventually get dropped would impact the system negatively.  Detecting
these problematic cases immediately would perhaps prevent a potential
denial of service, but given that all of these problems are due to
blatant misconfiguration of the system, and that we can't possibly
detect other simple cases of such misconfigurations, I agree that the
check is kind of strange.  I will remove it.

>   [#] I'm actually not convinced this is a good test for recursion- why
>   can't I set up 2 paths for B, one of which goes through tun1, and the
>   other through my default gw on ethernet? The constraint seems to be
>   that I *must* have outer_dst != inner_dst, and I'm not seeing why
>   this is needed- all that the tunnel needs to do is to make sure that,
>   after it encapsulates the packet, the route selected for the encapsulated
>   packet does not point through itself?

Ensuring that the route used for output of a tunnel destination does not
go through an IP interface above the tunnel is something that the
administrator does (or the routing protocol does) when creating a
configuration that works.  If the output interface for the tunnel
destination is an IP interface above the tunnel, then the tunnel doesn't
work at all and I don't think we care if packets get dropped or grovel
around in the system forever.  Giving the administrator ample statistics
to work with and diagnose that packets are getting dropped is probably
all we need to do, and we do that when packets get too large or hit
their encapsulation limit.

>  (and even more confusing:
>   we drop the packet on the input path when inner_dst == outer_dst even
>   if both of these are locally hosted addresses? I would expect a host
>   that receives such a packet to accept it, assuming everything else
>   checks out).

I'm afraid I'm not following you on this one.  Where is this check?

Thanks,
-Seb

[clearview-discuss] Clearview IP Tunneling code-review [due August 12]

Reply via email to