[clearview-discuss] Clearview IP Tunneling code-review [due August 12]

Sebastien Roy Mon, 31 Aug 2009 10:18:14 -0400

On Wed, 2009-08-26 at 15:18 -0700, Peter Memishian wrote:
>  > >  * 129: Why is this IPTUN_MIN_IPV6_MTU and not IPTUN_MIN_IPV4_MTU?
>  > 
>  > Because only IPv6 interfaces are plumbed above 6to4 links.
> 
> My understanding was that these values corresponded to the MTU of the
> underlying tunnel, hence the IPv4 values for an IPv4 tunnel and the IPv6
> values for an IPV6 tunnel.


They are the minimum possible values for the tunnel link MTU, yes.  For
a link that only supports IPv6 at the network layer (such as 6to4), it
doesn't make much sense to support a link MTU that is less than the
network layer minimum.  It's simply an optimization to eliminate a range
of values that can't possibly work.

> Further, the part I really cannot reconcile is
> why the minimum is pased on IPv6 and the maximum is based on IPv4.

Because the maximum value includes the overhead of the tunneling header,
which is an IPv4 header for 6to4.

> 
>  > >  * 251: Will this force users of iptun_m_unicst() to have special
>  > >    code to deal with iptun?  Is it inappropriate to map this to
>  > >    iptun_setsrc()?
>  > 
>  > The existing interface is there mainly for the benefit of callers of
>  > mac_unicast_primary_set(), which can handle failure (they won't need
>  > special code for IP tunnels).  The only such caller that really applies
>  > to IP tunnels is the handling of DL_SET_PHYS_ADDR_REQ in
>  > proto_setphysaddr_req().  Using this to set the tunnel source (without
>  > setting the tunnel destination) is meaningless anyway, which is why I
>  > didn't add support for it.
> 
> I find it architecturally wobbly for the caller to not implement certain
> entrypoints based on knowledge of how the framework currently makes use of
> those entrypoints.  If the entrypoint has semantic meaning and could be
> reasonably supported, why not implement it?

Okay, ACCEPT, it's easy enough to implement.  It will set the tunnel
source.

>  > >  * 344: Given that `maxmtu' is dynamic, this seems a problematic
>  > >    property to set programmatically since a call may or may not
>  > >    fail depending on dynamic conditions.  Further, if the call is
>  > >    made in while maxmtu is a large value, that value remains set
>  > >    even when maxmtu shrinks, which seems asymmetric.  (IP has many
>  > >    of the same problems with its current MTU handling.)
>  > 
>  > I agree that this has problems, and I struggled with coming up with a
>  > good solution.  I don't know that it's possible, but suggestions are
>  > welcome.  We could set the maximum to the maximum IP datagram size minus
>  > tunneling overhead (which is in the tens of thousands of bytes), and
>  > assume that administrator will have enough sense to set a value that
>  > works (given that most values above the current dynamically computed MTU
>  > most likely won't work).  This would allow the administrator to override
>  > the system default in a symmetric way (they can use any value at any
>  > time, regardless of the currently dynamically computed maximum).  Do you
>  > think that makes more sense?
> 
> Possibly.  A "force" behavior might also make sense, that allows anything
> up to the "hard" maximum to be specified if a force flag is provided.  Of
> course, that introduces some other complications, such as how to indicate
> the range of values that could be "forced".  I'd be fine with covering
> this issue in a P4 CR and leaving it for later.

I've filed:
6877483 allow forcibly setting a link property value beyond reported range

> 
>  > >  * 349: No way to cause IPTUN_FIXED_MTU to be cleared -- though the
>  > >    same limitation also exists today with IP interfaces.
>  > 
>  > This is a limitation of the property kernel API.  There is no way to
>  > know that the user has done a "reset".  It just comes down as a set with
>  > a default value.
> 
> Seems like a RFE should be filed against the property mechanism.

I've filed:
6877486 mc_setprop should denote reset operations

> 
>  > >  * 374-381: Would've expected this to be done by GLDv3, not by each
>  > >    individual driver (sigh).
>  > 
>  > There are indeed warts in this driver API.
> 
> Again, perhaps a CR?

I've filed:
6877488 mac_get_prop() doesn't filter out bogus pr_flags

> 
>  > >  * 430: Would've expected the default to be the value that's
>  > >    initialized in iptun_create() -- i.e. iti_maxmtu.
>  > 
>  > This is related to your earlier comment.  The current maximum is equal
>  > to the current dynamically computed maximum.  If we decide that such a
>  > maximum is artificial and asymmetric, an absolute maximum could be
>  > provided (like iti_maxmtu).
> 
> My comment was about the default value, not the maximum value.

Ah yes, I had my wires crossed.  The default value is the current value
in use by the link if the property isn't otherwise set.  It is therefore
the currently computed MTU as returned by iptun_get_maxmtu().

>  > >  * 558: Is there any other reason (besides a bug) that the lookup
>  > >    could fail?
>  > 
>  > Yes, the tunnel could have been deleted while the task was sitting in
>  > the taskq.
> 
> Right, the comment covers that one.  My comment was asking about the
> phrase "it's most likely because" -- this implies there are other reasons
> besides the tunnel being deleted.

ACCEPT: The comment is misleading.  The only reason that iptun_lookup()
fails is if the tunnel is gone or on its way out.  I'll fix the comment.

>  > >  * 925: Is this call to dls_mgmt_get_linkinfo() safe given that
>  > >    iptun_lock is held?
>  > 
>  > Yes.  This isn't an "upcall" since it's just a utility function in the
>  > dls module.  It doesn't enter this link's perimeter.
> 
> But it will call up to userland with the lock held, which seems bad since
> this imposes subtle and fragile restrictions on what userland can do
> (i.e., it will deadlock if it tries to do anything that needs to acquire
> iptun_lock).

In this case, the function is "give me this link's name".  It will not
reenter into the iptun module (this door upcall simply fills in the data
structure and returns).  I don't really see what userspace has to do
with this, though.  If mapping a linkid to a link name required calling
back into the driver, then why would it not need to call back into the
driver if the function were implemented entirely in the kernel?
(anyway, I digress, I'm confident that this particular function is safe)

>  > >  * 949-991: The locking here might be fine, but I can't make heads
>  > >    or tails of it.  Seems like a lot of IPsec machinery is directly
>  > >    exposed to consumers.
>  > 
>  > DEFER: There is quite a bit of direct manipulation of internal data
>  > structures from the IPsec implementation here, but I don't think there's
>  > much that I can do about it short of implementing a higher level kernel
>  > API for this stuff (there doesn't appear to be one).
> 
> OK.  In general, it seems like an IPsec API layer is missing.

Is seems so, yes.

>  > > 
>  > >  * 1032: From the perspective of the caller, it's unclear what
>  > >    iptun_setparams() may have done if the call fails and multiple
>  > >    flags were set in itk_flags, which seems to limit the usefulness
>  > >    being able to set multiple params at once.
>  > 
>  > ACCEPT: Will have iptun_setparams() restore the iptun_t to its original
>  > state on failure.
> 
> Is that feasible?  For instance, what if you're not able to restore all of
> the state?

Yes, it is feasible.  This only applies to parameters that can be
modified, and those are the tunnel source, tunnel destination, and
simple IPsec policy.  I've moved the setting of the implicit flag and
tunnel type to iptun_create(), since those are immutable after creation.
Since we set the security policy last, the only things that need to be
reset are the source and destination.  By moving the "unbind from the
old addresses" to be _after_ the call to iptun_setparams() in
iptun_modify(), I can reset the addresses without affecting the tunnel
state.

> 
>  > >  * 1079-1092: Punting this to the hapless caller via EAGAIN
>  > >    complicates the interface with an implementation artifact.
>  > >    Seems like another mechanism is needed here.  (IIRC, GLDv3
>  > >    ioctls are not allowed to block, which means the mechanism
>  > >    might need to be quite a bit different.)
>  > 
>  > DEFER: I've brought this up with the IPsec team and Erik Nordmark, and
>  > the consensus was that since we'll never run into this problem (you need
>  > IPsec keys and policy to use IPsec tunnels, and the creation of those
>  > objects will have loaded IPsec before tunnels are created), it was not
>  > worth my re-architecting how IPsec is loaded at this point.
> 
> If we can never get here, then I'd think this would be a panic().  If we
> can get here, then is EAGAIN really good enough?

We can get here, but only under contrived scenarios.  You'd have to set
IPsec policy on the tunnel without having created any IPsec keys nor
started IKE nor loaded any other global IPsec policy.  When we create
tunnels during boot, those other things are done first.  When IPsec
tunnels are created by something like a VPN, those things are also done
first.  It would only be in a case where someone is playing around and
doing things out of order that this would come up.  They'd get EAGAIN,
and indeed, trying again would work.

>  > >  * 1467, 1481: Still need to hold iptun_lock when clearing
>  > >    IPTUN_CONDEMNED to guarantee the flag change will be visible to
>  > >    other threads (usually done as part of the memory barrier as
>  > >    part of mutex_exit()).
>  > 
>  > I'm not sure I fully understand the problem you're pointing out nor the
>  > solution you're proposing.  I can't hold the lock while calling
>  > dls_devnet_destroy() nor mac_disable(), so I have to drop the lock at
>  > 1464.  Are you suggesting that I need to hold the lock through the calls
>  > to dls_devnet_destroy() and mac_disable() (and incur deadlock), or that
>  > I wrap the flag change itself around iptun_mutex?  If it's the latter,
>  > then I'm not sure how this would matter.  Other threads that see the
>  > flag set will return with an error just as they would have if they had
>  > attempted to access the iptun_t a few moments earlier, say at 1465.  Can
>  > you clarify what you mean?
> 
> I mean the latter -- but you point out a broader problem, which is that
> IPTUN_CONDEMNED may be set temporarily and then cleared, which doesn't
> seem semantically right.  That is, iptun operations fail with ENOENT
> apparently at random just because an iptun_delete() that ultimately failed
> was in progress at the time.

ACCEPT: You're right.  A separate mechanism is needed to have callers of
iptun_lookup() wait until we're sure if the tunnel will in fact be
deleted or not.  I will fix this.

> 
>  > >  * 1482: What happens if dls_devnet_create() fails?  Is it
>  > >    appropriate to panic here?
>  > 
>  > This worked differently and quite well until 6791335 was introduced.
>  > The aggr, vnic, and simnet drivers (and now forcibly iptun) all have
>  > this feeble attempt at re-creating their universe if mac_disable()
>  > fails.  The link will be unusable as a result, but the system will
>  > otherwise be usable.  It's probably not worth panicing the system over
>  > (this can actually legitimately happen).
> 
> A comment explaining the impact of the dls_devnet_create() failing would
> be useful.

ACCEPT

> 
>  > >  * 1620: I'm confused by the name/semantics of this flag -- the
>  > >    code seems to use it to indicate that there is IPsec policy,
>  > >    regardless of whether ipsecconf needs to be used -- e.g.,
>  > >    ifconfig only advises using "ipsecconf -ln" when ITK_IPSECCONF
>  > >    is set and ITK_SECINFO is clear.
>  > 
>  > That's right, ITK_IPSECCONF means that there is policy covering this
>  > tunnel.  ITK_SECINFO indicates that the simple policy set through
>  > ifconfig is stored in itk_secinfo.
> 
> So could ITK_IPSECCONF be renamed ITK_IPSECPOL or the like?  Having it
> have the same name as a related command seems confusing.

ACCEPT

>  > > 
>  > >  * 1902, 1942: Should a counter be bumped?
>  > 
>  > ACCEPT: Will bump oerrors
> 
> Seems more like noxmtbufs.

Yes indeed, I've fixed that.

Thanks,
-Seb

[clearview-discuss] Clearview IP Tunneling code-review [due August 12]

Reply via email to