On Sun, 2009-08-16 at 01:40 -0700, Peter Memishian wrote: > My comments on IP (and a few other files I was interested in reviewing) > are below. I'm still working through iptun and will send my comments on > it separately. In general, I found very few issues (mostly just > additional opportunities for cleanup) and found your changes to be clean > -- great work!
Great thanks for sending these comments separately instead of waiting to send them all at once. It allows me to fix these while you're working on the rest. Responses in-line: > > Comments relative to http://cr.opensolaris.org/~seb/clearview-iptun-cr2/ > > uts/common/net/if.h: > > * 684-731: Seems like all this old iftun_req-related stuff can go. ACCEPT: It can, and it was removed at one point in the past. An hg merge brought it back to life. :-( > uts/common/inet/ip.h: > > * 1965, 1972, 2047: Synchronization table (starting around > line 2146) needs to be updated with the rules for these > new fields. ACCEPT > > * 1986: New ip_promisc_on_phys field doesn't seem to be used. ACCEPT: This is a mismerge, nice catch. That field was previously removed from onnv by another project, and it was not removed from the Clearview gate during the merge with that project. I'll remove it. > * 2039: While you're here, s/New fields/Fields/ :-) ACCEPT > > uts/common/inet/ip/ip.c: > > * General: the code seems to go out of its way to call BUMP_MIB() > on recv_ill (2375, 12892), but RFC4293 says that we should > bump the MIB on the ill that owns the address. From the RFC: > > Local packets on the input side are counted on the interface > associated with their destination address, which may not be > the interface on which they were received. This requirement > is caused by the possibility of losing the original interface > during processing, especially re-assembly. ACCEPT: That's a mistake; will fix. > > * 2666, 2723, 2724: Why are these cases in the default part of > the switch statement? ACCEPT: This can be factored more cleanly (I'm not sure why the original fanout looked like this). We can set ripha above the switch for all "transports", and add a case for IP-in-IP instead of having it be part of the default case. > > * 6365-6367: Remove stale comment about matches on tunnels. ACCEPT > > * 6523: What does `nailed' mean here? ACCEPT: This is a Kebe-ism meaning "processed". :-) I've clarified the comment as: /* * Enforce policy like any other conn_t. Note that * IP-in-IP packets don't come through here, but * through ip_iptun_input() or * icmp_inbound_iptun_fanout(). IPsec policy for such * packets is enforced in the iptun module. */ > > * 16396-16399, 16448: Comment seems needless. ACCEPT > > * 16447: The old code only called ipif_setlinklocal() if the > IPv6 address was zero, but the new code always does it. Is > this an intentional behavioral change? It is intentional, but not a crucial change or one that has far-reaching impact. I was re-working this section of the code, and only setting the link-local address if it was previously unspecified seemed needless. > * 16534: Rather than "Normally called as writer", it would be more > precise to say "If `ipsq' is non-NULL, caller is writer on it." > (Existing comment also seems a bit strange, as it's been a while > since "global" state was modified here.) ACCEPT: Will fix the writer reference. > > * 16567-16570: This is strange vestigial logic; seems like this > should be removed and 16597 should ASSERT(0). ACCEPT: Random thought: I wonder why we're okay with using "0" as the parameter given that it takes a boolean expression... Anyway, will use ASSERT(0) since that's what we've always used throughout. > > * 16574-16592: Nested switch statement seems like overkill now; > we could just ASSERT(iocp->ioc_cmd == DL_IOC_HDR_INFO) on 16574 > and remove the inner switch. ACCEPT > > * 27283-27289: I don't understand this code. First, why does > iptun require special handling that none of the other modern > ULPs require? Second, this case is for an mp of M_{PC,}PROTO > but iptun_input() ASSERT()s that mp is either M_DATA or M_CTL. > Is this a vestige from an older iteration of the code? If this > code does need to stay, then the comments (e.g., 27250-27251) > need to be updated. ACCEPT: This is indeed a vestige from when iptun's bind mechanism handled asynchronous T_BIND_ACK M_PROTO messages. The code was simplified and this nugget was left behind accidentally. Removed. I'll also add ASSERT(!IPCL_IS_IPTUN(connp)) to the list of existing ASSERT()s of the same nature. > > uts/common/inet/ip/ip6.c: > > * Same general comment as ip.c regarding BUMP_MIB() calls. ACCEPT > > * 1063-1064: I'm confused why this needs to be in the `default' > case, rather than just in the case that handles IPPROTO_ENCAP > and IPPROTO_IPV6. ACCEPT: This is a bug, since iptun will get ICMPv6 errors for random protocols it doesn't expect. Will fix. > > * 1064: Seems like `first_mp' is leaked on failure. REJECT: That's not failure. icmp_inbound_iptun_fanout_v6() returns B_TRUE if the mblk was consumed (i.e., there was an iptun interested in the error), or B_FALSE otherwise. In the latter case, we fall through and let ip_fanout_proto_v6() take a stab at finding a raw socket that might want the error. I'll add a comment above icmp_inbound_iptun_fanout_v6() to make sure the return semantics are clear. > > * 3085-3086: Remove stale comment about matching on tunnels. ACCEPT > > uts/common/inet/ip/ip6_if.c: > > * 1154, 1162: Given that these are the only calls to > MEDIA_V6INTFID() and MEDIA_V6DESTINTFID() and they ignore the > return values from these functions, it would seem preferable > to change ip_v6intfid_func_t to not return a value and update > the functions to ASSERT() their conditions. ACCEPT > > * 1185-1198: This function would be easier to read with a local > `ill' variable. ACCEPT > > * 1193: What guarantees ill_phys_addr is suitably aligned? ACCEPT: Nothing guarantees that, but it happens to be that way. I'll copy to a local variable. > > * 1297-1298: Since success is returned, seems like this should > be moved down to 1303 so that ret_nce will be set to NULL. ACCEPT > > * 3054: Stale reference to atun. ACCEPT > > uts/common/inet/ip/ip_if.c: > > * 1074-1075: This comment could be simplified to "This is only > needed for SIOCG*ARP." ACCEPT > > * 1084-1087: Dead code/comments concerning tunnel ioctls. ACCEPT > > * 1194, 1997: Likewise. ACCEPT: 1194 and 1197, yes. There also appears to be a few more instances of processing M_IOCTL. Were those all guaranteed to be the tunnel ioctls that I removed, or can there be others? E.g, other all other *_pending_mp_*() functions, and ipsq_xopq_mp_cleanup(). I'm guessing that those can go as well... > > * 5376: Comment could be revised. ACCEPT: Revised to: /* * The underying link is point-to-point, so mark the * interface as such. We can do IP multicast over * such a link since it transmits all network-layer * packets to the remote side the same way. */ > > * 18811: Not your doing, but how can we get here with a non-zero > ill_sap, and why would we want to preserve the old ill_sap > value? You're referring to the old 18811, right? I don't know, there seems to be quite of bit of voodoo involved in setting ill_sap (it looks like it might get set three times to the same value during the plumbing of an IP interface!). Where do we preserve the old ill_sap value? I'd rather not mess with this logic at this point. I think we need a "clean up IP plumbing logic" RFE where we handle all such irrational mechanisms (this isn't the only one). > > * 19676: I presume this is just our own custom, and not an > algorithm suggested by an RFC? Incidentally, if the assumption > is that the uniqueness of the IP address will ensure the > uniqueness of the interface ID, it seems that failure to account > for endianness breaks that assumption. It's a suggested convention described in RFC 4213: http://tools.ietf.org/html/rfc4213#section-3.7 Note that the interface-ID only needs to be unique on the link, not globally. The only thing that's important is that both sides pick different interface-IDs. It's also important that both sides agree on what the other's address is, but that is left to protocols such as IPV6CP or a tunnel negotiation protocol (link tunnel broker) that end up explicitly setting the link local addresses to negotiated values. In any case, I've added a reference to the RFC in the intro block comment for this function. As for your mention of endianness, where did I get this wrong? The "physical" address is already in network byte order. > * 19686: What ensures that `physaddr' is suitably aligned for > this cast? ACCEPT: Nothing; it just happens to be 4-byte aligned in the current implementation. We should copy to be safe. > * 19692: Similar comments as 19676. > This one isn't documented in any RFC that I remember, but it's a convention that seems to be used by various OSs. Note that I didn't invent this logic here, I just moved it into the media-table driven mechanism from the now removed ipif_set_tun_llink() function. > * 19936-19939: Seems like a bugfix; is there a CR? ACCEPT: Ironically, it was a bug for all link types except for tunnels. In the previous code, the ip module would intercept the M_IOCACK for the old SIOCSTUNPARAM ioctl and change the link-local address by peeking inside the iftun_req. It was a total hack-job. The hack is replaced by promoting tunnel addresses to the link-layer addresses they really were. When they change, a DLPI notification is generated by GLDv3 and the right thing happens here. For other types of links, this wasn't handled correctly, and this is CR 4289774. It looks like I need to work on this fix a tad more, though. I need to make sure that if the token is manually set, that we don't automatically change it from under the admin. > * 19936-19939: Is there a reason to do this before the call to > ill_set_ndmp()? Seems safer to ensure that ill_nd_lla* have > been correctly reset in case the code called from > ill_setdefaulttoken() or ipif_setlinklocal() needs to make use > of them. ACCEPT: I'll move it down. > > uts/common/inet/ip_if.h: > > * 296: Remove dead ip_extract_tunreq() declaration. ACCEPT > > * 433: Remove dead ip_sioctl_tunparam() declaration. ACCEPT > > uts/common/inet/ip/ip_multi.c: > > * 1391, 1440: I'm not getting it: what's the required relationship > between an IP interface requiring a resolver and an IP interface > supporting multicast? ACCEPT: The reason why this made sense: The only reason where it would make sense to leave all multicast groups and then rejoin them is on links where we've joined any groups. The pre-existing logic used to determine whether or not groups should be joined (i.e., send down DL_ENABMULTI_REQ) is that resolver interfaces support such DLPI messages (see ip_ll_addmulti_v6()). This makes some sense since resolver interfaces correspond to broadcast-capable links that also support link-layer multicast. A better heuristic would be whether or not the media table has a multicast mapping function... That said, the lines you're commenting on were added long ago and are no longer relevant. The checks can be removed. > > uts/common/inet/ip/ip_ndp.c: > > * 184: s/legnth/length/. More importantly, this is legacy support > for e.g. x.25, right? The comment makes it sound more general. > Has anyone actually confirmed that x.25 uses this mechanism? ACCEPT: Accept the spelling mistake. To answer your question, the code was previously there before it was "accidentally" removed by (I think) Surya. Any point-to-multipoint media could have used this mechanism, so it could be used generically. Erik mentioned X.25 in his code-review comments, but I don't know if it actually uses this. > > uts/common/inet/ip/ip_classifier.c: > > * 311: 389 because? (Yes, I know it's prime -- but so are 383 > and 397.) ACCEPT: Will add a comment, but I'm not sure what to say other than "it's a good enough prime number"... > > * 1061: Would seem more efficient to `goto done' here and remove > 1063-1064. ACCEPT > > * 1096: Would seem simpler and more effecient to hoist the > CONN_INC_REF() to here and remove 1098-1099. ACCEPT > > * 1294, 1374, 1539, 1637: What is `protocol' when we get here? > I would've expected IPPROTO_ENCAP and IPPROTO_IPV6 -- if so, why > aren't those new cases in the switch statement? Is it to > take advantage of the existing check_exempt_conflict*() logic? It's not to take advantage of that logic. The reason is two-fold: 1. We don't want to put raw sockets bound to IPPROTO_ENCAP or IPPROTO_IPV6 in the iptun fanout, so we differentiate iptun conn_t's based on IPCL_IS_IPTUN() instead of the protocol value passed in. 2. The upper protocol for a given iptun conn_t is indeterminate. One can plumb either IPv4 or IPv6 IP interfaces above a given tunnel, so there is no single correct value for conn_ulp on an IP tunnel. That said, we pick an arbitrary value of either IPPROTO_ENCAP for tunnels over IPv4 and IPPROTO_IPV6 for tunnels over IPv6 just because we have to pick a value when calling ip_proto_bind_*() in iptun.c, but that is somewhat irrelevant given (1) above. All that said, moving the IPCL_IS_IPTUN() check before the switch would make more sense given that we don't gain anything by being inside of the default case (IP tunnels can't be MAC exempt). I will do that. > > * 1868, 2072: Insert blank line to match existing style in the > switch statement. ACCEPT > > uts/common/inet/ip_classifier.h: > > * 528-537: Out of curiosity, where do these hash algorithms come > from? Have we confirmed that they are well-balanced in common > use? I don't remember how these came to be. I can provide some real-world distribution data this week based on the workloads on the punchin servers. > > uts/common/inet/ipsec_info.h: > > * 279-281: TUN_HELLO can go. ACCEPT > uts/common/fs/dev/sdev_netops.c: > > * 69: zone_check_datalink() seems to be assuming system call > context (e.g., it does copyinstr()); seems strange to call it > from this context. Hmm? zone_check_datalink() was modified to be called from the kernel (see changes to uts/common/os/zone.c). The ZONE_CHECK_DATALINK system call does the copyin/copyout calls outside of the context of this function now. > > * 218, 222: Can simply pass NULL for the `link' argument. I don't understand your comment. The dls_mgmt_get_linkinfo() fills in the link argument with the name of the link associated with linkid. Thanks, -Seb