[ Clear-cut items removed. ] > > http://cr.opensolaris.org/~seb/clearview-iptun-cr3/ > > > > usr/src/uts/common/inet/iptun/iptun.c: > > > > * 62: What in <sys/strsubr.h> does this file rely on? That header > > contains symbols private to the STREAMS framework. > > ACCEPT: On a related note, there are over 300 files outside of the > framework including this file in the kernel. Something seems generally > amiss if it's not meant to be included by anything outside of the > framework.
It's mostly a lot of cut-and-paste sloppiness which keeps spreading. There are also a few symbols in <sys/strsubr.h> that need to be moved to another header like <sys/stream.h>. > > * 129: Why is this IPTUN_MIN_IPV6_MTU and not IPTUN_MIN_IPV4_MTU? > > Because only IPv6 interfaces are plumbed above 6to4 links. My understanding was that these values corresponded to the MTU of the underlying tunnel, hence the IPv4 values for an IPv4 tunnel and the IPv6 values for an IPV6 tunnel. Further, the part I really cannot reconcile is why the minimum is pased on IPv6 and the maximum is based on IPv4. > > * 251: Will this force users of iptun_m_unicst() to have special > > code to deal with iptun? Is it inappropriate to map this to > > iptun_setsrc()? > > The existing interface is there mainly for the benefit of callers of > mac_unicast_primary_set(), which can handle failure (they won't need > special code for IP tunnels). The only such caller that really applies > to IP tunnels is the handling of DL_SET_PHYS_ADDR_REQ in > proto_setphysaddr_req(). Using this to set the tunnel source (without > setting the tunnel destination) is meaningless anyway, which is why I > didn't add support for it. I find it architecturally wobbly for the caller to not implement certain entrypoints based on knowledge of how the framework currently makes use of those entrypoints. If the entrypoint has semantic meaning and could be reasonably supported, why not implement it? > > * 344: Given that `maxmtu' is dynamic, this seems a problematic > > property to set programmatically since a call may or may not > > fail depending on dynamic conditions. Further, if the call is > > made in while maxmtu is a large value, that value remains set > > even when maxmtu shrinks, which seems asymmetric. (IP has many > > of the same problems with its current MTU handling.) > > I agree that this has problems, and I struggled with coming up with a > good solution. I don't know that it's possible, but suggestions are > welcome. We could set the maximum to the maximum IP datagram size minus > tunneling overhead (which is in the tens of thousands of bytes), and > assume that administrator will have enough sense to set a value that > works (given that most values above the current dynamically computed MTU > most likely won't work). This would allow the administrator to override > the system default in a symmetric way (they can use any value at any > time, regardless of the currently dynamically computed maximum). Do you > think that makes more sense? Possibly. A "force" behavior might also make sense, that allows anything up to the "hard" maximum to be specified if a force flag is provided. Of course, that introduces some other complications, such as how to indicate the range of values that could be "forced". I'd be fine with covering this issue in a P4 CR and leaving it for later. > > * 349: No way to cause IPTUN_FIXED_MTU to be cleared -- though the > > same limitation also exists today with IP interfaces. > > This is a limitation of the property kernel API. There is no way to > know that the user has done a "reset". It just comes down as a set with > a default value. Seems like a RFE should be filed against the property mechanism. > > * 374-381: Would've expected this to be done by GLDv3, not by each > > individual driver (sigh). > > There are indeed warts in this driver API. Again, perhaps a CR? > > * 430: Would've expected the default to be the value that's > > initialized in iptun_create() -- i.e. iti_maxmtu. > > This is related to your earlier comment. The current maximum is equal > to the current dynamically computed maximum. If we decide that such a > maximum is artificial and asymmetric, an absolute maximum could be > provided (like iti_maxmtu). My comment was about the default value, not the maximum value. > > * 558: Is there any other reason (besides a bug) that the lookup > > could fail? > > Yes, the tunnel could have been deleted while the task was sitting in > the taskq. Right, the comment covers that one. My comment was asking about the phrase "it's most likely because" -- this implies there are other reasons besides the tunnel being deleted. > > * 909: Is this really worth three exclamations? > > ACCEPT: No > > > > > * 910: Supposed to be "&" rather than "&&"? Still not sure I > > understand this check though -- e.g., why is it OK for > > ipsr_self_encap_req to be IPSEC_PREF_UNIQUE? > > ACCEPT: I think you're right, there are two bugs on that line. It > should be (ipsr->ipsr_self_encap_req != 0). A new postulate: the more exclamation points in a comment, the more likely the code that it references is buggy ;-) > > * 925: Is this call to dls_mgmt_get_linkinfo() safe given that > > iptun_lock is held? > > Yes. This isn't an "upcall" since it's just a utility function in the > dls module. It doesn't enter this link's perimeter. But it will call up to userland with the lock held, which seems bad since this imposes subtle and fragile restrictions on what userland can do (i.e., it will deadlock if it tries to do anything that needs to acquire iptun_lock). > > * 949-991: The locking here might be fine, but I can't make heads > > or tails of it. Seems like a lot of IPsec machinery is directly > > exposed to consumers. > > DEFER: There is quite a bit of direct manipulation of internal data > structures from the IPsec implementation here, but I don't think there's > much that I can do about it short of implementing a higher level kernel > API for this stuff (there doesn't appear to be one). OK. In general, it seems like an IPsec API layer is missing. > > > > * 1032: From the perspective of the caller, it's unclear what > > iptun_setparams() may have done if the call fails and multiple > > flags were set in itk_flags, which seems to limit the usefulness > > being able to set multiple params at once. > > ACCEPT: Will have iptun_setparams() restore the iptun_t to its original > state on failure. Is that feasible? For instance, what if you're not able to restore all of the state? > > * 1079-1092: Punting this to the hapless caller via EAGAIN > > complicates the interface with an implementation artifact. > > Seems like another mechanism is needed here. (IIRC, GLDv3 > > ioctls are not allowed to block, which means the mechanism > > might need to be quite a bit different.) > > DEFER: I've brought this up with the IPsec team and Erik Nordmark, and > the consensus was that since we'll never run into this problem (you need > IPsec keys and policy to use IPsec tunnels, and the creation of those > objects will have loaded IPsec before tunnels are created), it was not > worth my re-architecting how IPsec is loaded at this point. If we can never get here, then I'd think this would be a panic(). If we can get here, then is EAGAIN really good enough? > > * 1098: As an aside, I'm unclear what constitutes a "simple" > > policy. Is this defined somewhere? > > ACCEPT: Simple policy is policy described in the ipsec_req_t embedded in > an iptun_kparams_t, which originates from the ifconfig command line. > It's "simple" because it can only state "do ESP on everything" or "do AH > on everything" (as opposed to the rich policy language offered by > ipsecconf). I've added a comment. I see. > > * 1448: EACCES usually pertains to file permissions; would've > > expected EPERM (and that's done at e.g. 1503). > > My understanding (from existing usage and various discussions where this > debate was had in the past) is that EPERM indicates that the caller > doesn't have the permissions to perform an operation, and EACCES > indicates that the caller is denied access to the object being operated > on. My understanding would thus lead me to change 1503 to EACCES... OK. > > * 1467, 1481: Still need to hold iptun_lock when clearing > > IPTUN_CONDEMNED to guarantee the flag change will be visible to > > other threads (usually done as part of the memory barrier as > > part of mutex_exit()). > > I'm not sure I fully understand the problem you're pointing out nor the > solution you're proposing. I can't hold the lock while calling > dls_devnet_destroy() nor mac_disable(), so I have to drop the lock at > 1464. Are you suggesting that I need to hold the lock through the calls > to dls_devnet_destroy() and mac_disable() (and incur deadlock), or that > I wrap the flag change itself around iptun_mutex? If it's the latter, > then I'm not sure how this would matter. Other threads that see the > flag set will return with an error just as they would have if they had > attempted to access the iptun_t a few moments earlier, say at 1465. Can > you clarify what you mean? I mean the latter -- but you point out a broader problem, which is that IPTUN_CONDEMNED may be set temporarily and then cleared, which doesn't seem semantically right. That is, iptun operations fail with ENOENT apparently at random just because an iptun_delete() that ultimately failed was in progress at the time. > > * 1482: What happens if dls_devnet_create() fails? Is it > > appropriate to panic here? > > This worked differently and quite well until 6791335 was introduced. > The aggr, vnic, and simnet drivers (and now forcibly iptun) all have > this feeble attempt at re-creating their universe if mac_disable() > fails. The link will be unusable as a result, but the system will > otherwise be usable. It's probably not worth panicing the system over > (this can actually legitimately happen). A comment explaining the impact of the dls_devnet_create() failing would be useful. > > * 1620: I'm confused by the name/semantics of this flag -- the > > code seems to use it to indicate that there is IPsec policy, > > regardless of whether ipsecconf needs to be used -- e.g., > > ifconfig only advises using "ipsecconf -ln" when ITK_IPSECCONF > > is set and ITK_SECINFO is clear. > > That's right, ITK_IPSECCONF means that there is policy covering this > tunnel. ITK_SECINFO indicates that the simple policy set through > ifconfig is stored in itk_secinfo. So could ITK_IPSECCONF be renamed ITK_IPSECPOL or the like? Having it have the same name as a related command seems confusing. > > * 1736-1750: Seems like a utility function (in IPsec-land) that > > initializes an ipsec_selector_t from a provided IPv4/IPv6 > > address and protocol would be useful. > > DEFER: The savings are minimal, I'll defer this to another time when > refactoring selector initialization in spd.c could be more safely done. OK. > > * 1875: The comment is obvious from the code -- would be more > > useful if the comment said why. > > ACCEPT > /* > * We only dynamically adjust the tunnel MTU for tunnels with > * destinations because dynamic MTU calculations are based on the > * destination path-MTU. > */ Looks good. > > > > * 1902, 1942: Should a counter be bumped? > > ACCEPT: Will bump oerrors Seems more like noxmtbufs. > > * 2069: Would be worth explaining why we don't need to worry about > > packets that may not have an IP header's worth of data in their > > first mp. > > ACCEPT > /* > * Don't bother handling packets that don't have a full IP header in > * the fist mblk. For the input path, the ip module ensures that this > * won't happen, and on the output path, the IP tunneling MAC-type > * plugins ensure that this also won't happen. > */ Looks good. > > * 2151: ASSERT(MBLKL(data_mp >= 0)) would be useful. Also, an > > explanation of why this is safe. > > ACCEPT: > /* > * The ip module ensures that ICMP errors contain at least the > * original IP header (otherwise, the error would never have made it > * here). > */ Looks good. > > * 2273: ASSERT(endptr < mp->b_wptr) would be useful, as would a > > comment on why this is safe. > > I actually can't ASSERT() that. It doesn't look like ip_find_hdr_v6() > guarantees that. It only guarantees that ipp_dstopts (or ipp_rtdstopts) > points at the start of a destination options header. I need to return > B_FALSE if I encounter this condition, and the code currently behaves > that way due to the condition in the while loop. OK. > > * 2507: Block comment explaining the various types of M_CTL/M_DATA > > conventions expected here would be useful -- e.g., M_CTL with an > > M_CTL b_cont is a protected ICMP packet, but M_CTL with an > > M_DATA b_cont is just a protected IP packet? Or something else? > > The reader shouldn't need to have a deep knowledge of these > > obscure conventions to understand this code. > > ACCEPT: I've added the following to the intro. block comment for > iptun_input(): That's useful -- thanks. > > * 2546: Why are these casts needed? > > I knew you'd ask that when I added these! ;-) Both sides of the ':' > need to have the same type, otherwise, the compiler complains: I see. > > * 2870: OK to access itp_flags without itp_lock held? > > Yes, I verified this with danmcd: If the ITPF_P_ACTIVE flag changes, > this results in a harmless race whereby a packet may be transmitted > using different policy than what was ultimately intended (which would > have happened anyway if the packet had made it here a few ticks > earlier). OK. > > * 2881: What cases cause us to get here with b_next != NULL, and > > why do those conditions only apply when IPsec policy is active? > > Would be nice to add a comment explaining this. > > ACCEPT: Modified the comment at 2878 to: > /* > * ipsec_tun_outbound() returns a chain of tunneled IP > * fragments linked with b_next (or a single message if the > * tunneled packet wasn't a fragment). Each message in the > * chain is prepended by an IPSEC_OUT M_CTL block with > * instructions for outbound IPsec processing. > */ Looks good. > > * 2891: Where is this done? > > It's done by ip's processing of the packet in ip_wput_ire(). I've > clarified the comment slightly: > /* > * The ip module will potentially apply global policy to the > * packet in its output path if there's no active tunnel > * policy. > */ Looks good. -- meem