Hi Phil,

On Tue, 2008-07-15 at 16:03 +0100, Phil Kirk wrote:
> This is a request for a review of the latest ipobs bits which have now 
> been updated to reflect the first round of review comments.
> 
> The new webrev can be found here:
> 
> http://cr.opensolaris.org/~philk/ipobs-2nd-review/
> 
> It can be found internally here including cscope:
> 
> /net/exorcist.uk/storage/builds/ipobs-only/webrev/

Great work, we're getting close. :-)  Nit: the cscope database is out of
sync with the code in the above workspace.  Otherwise, here are my
comments:

usr/src/cmd/cmd-inet/usr.sbin/snoop/snoop.c

* 776: The argument is still named "device"

* 780: named


usr/src/cmd/cmd-inet/usr.sbin/snoop/snoop_capture.c

* 238: Could you check the MAC type instead of the Iflg and strcmp()?

* 446-447: I don't understand the purpose of these two new lines.
  They can be reduced to "MAX_HDRTRAILER != 0", which is always true
  as MAX_HDRTRAILER is hard-coded as 64.


usr/src/cmd/cmd-inet/usr.sbin/snoop/snoop_ether.c

* 1602: Consider a copy here instead to prevent potential unaligned
  memory access problems.

* 1605: What is 8252?

* 1609: Comment needed, what does this check for?  I never remember
  what all of these variables are and what their relationships are:
  len, elen, origlen, blen, datalen.

* 1614,1616: if (data == NULL)

* 1660,1663: These values don't match what you have at line 87.  On
  87, the field's number space is ethertypes, and not IP version
  numbers.  I think line 87 needs to be fixed, and that the error
  results in IP address-based user-space filters to not work (i.e.,
  uses of network_type_ip and network_type_ipv6 in snoop_filter.c
  won't work).


usr/src/cmd/cmd-inet/usr.sbin/snoop/snoop_pf.c

* 140,159,178: I don't think you need a separate
  transport_protocol_table_t for each supported link-layer.  The
  essential information is identical for each one.

* 160-174: What are 0x0104 and 0x0106?  I would have expected to see
  IPV4_VERSION and IPV6_VERSION here instead, but they don't
  correspond to those values.

* 1148-1278: Please provide a utility lookup function rather than
  duplicating the same for(;;)/strcmp() loop in multiple places.
  Alternatively you could use libnvpair(3LIB) if it simplifies things,
  but it's up to you.


usr/src/lib/libdlpi/common/libdlpi.c

* 102: There's no support for walking /dev/ipnet DLPI nodes.  I
  remember having a discussion a few times where the question of
  whether or not we wanted to support this in dlpi_walk() was brought
  up, but I don't remember the resulting decision.


usr/src/uts/common/fs/dev/sdev_ipnetops.c

* 79: I'm not sure I understand the purpose of this check.  It looks
  like it should instead be an ASSERT(), but perhaps you can clarify
  where the check would fail and it wouldn't because of a bug in the
  ipnet code.

* 111: Extra line.

* 133: When do we get back VDIR?


usr/src/uts/common/inet/ip.h

* 50: Is this really needed?


usr/src/uts/common/inet/ip/ip.c

* 14766,15146,16203,16499,22679: Needless new lines.

* 15119-15193: The handling of ire is a bit messy here.  There are
  multiple, identical, calls to ire_cache_lookup(), and multiple
  places where the ire is released.  For example, if we do the lookup
  at 15147 in the general case, then we don't need to do it again at
  15193.

* 15146: No need for braces.

* 15168: This new code makes it look like ip_input() drops all packets
  that are not labelled.  That can't be right.  The problem is that
  the name of the variable "pkt_labelled" is misleading given its
  semantics.  I'm not sure you need the variable at all.  Why not
  simply move the block of code at 15133 down to 15168?

* 22695: No need for braces.

* 22767,27172: Needless lines removed.

* 30178,30199: The name of these functions are too generic given their
  ipnet specificity.

* 30186,30206: If I read ipnet.h correctly, you will not wake up from
  cv_wait() unless ips_ipnet_cb_nwalkers == 0.  As such, these while()
  loops should simply be if() statements, and you can ASSERT() that
  ips_ipnet_cb_nwalkers == 0 below the cv_wait() calls.  Or did I
  misunderstand the locking scheme?

* 30203: Blank line needed after local variable declarations.

* 30209: The variable "head" is misnamed, I think.  It's not the head
  of the list, but just the current cb pointer.


usr/src/uts/common/inet/ip/ip6.c

* 4950,5887,10588: Needless new lines.

* 6713: Needless line removed.

* 6858-6861,6864-6866: I didn't see this logic for IPv4 in ip.c.  Is
  that on purpose?

* 6860: Again, the name of pkt_labelled implies something that isn't
  true.


usr/src/uts/common/inet/ipnet.h

* 83: Comment clarification request: In the case of local
  transmission, what does this get set to?

* 111,112,117,118: You never use anything in these sockaddr's except
  for the IP addresses.  Instead of storing sockaddr structures, you
  should be storing IP addresses.  You then wouldn't need the macros
  at 113, 114, 119, and 120.

* 116-121: The netmasks aren't used anywhere in the code, please
  remove them.

* 125: ifa_status as a boolean doesn't make sense.  "To be status or
  not to be status"?  Should this be "ifa_up"?  On a related note, it
  doesn't look like ifa_status is used anywhere.  If that's not a bug,
  then it should simply be removed.

* 158: Does this really need to be a macro?  I'd much rather it be a
  function that can be traced.  Also, it looks like this belongs with
  ips_register_hook() and ips_unregister_hook(), as they're fiddling
  with the same linked list and dealing with the same callbacks.

* 173: I don't see a need to fall-back to copymsg() if dupmsg() fails.


usr/src/uts/common/inet/ipnet/ipnet.c

* General: I'm confused with the use of minor numbers per interface,
  specifically if_minor.  I think this might be more complicated than
  it needs to be.  As far as I can tell, it's used simply as a unique
  number for lookups in an AVL tree, so I don't immediately see why
  it's called if_minor.

  Further, IP interfaces already come with unique interface ids which
  you're given as hne_nic in the netinfo plumb events, and which you
  store in the ipnetif_t as if_index.  Why not simply use those as the
  unique numbers and be rid of the "minor number" allocation scheme
  you currently have?  This would also mean that looking up an
  interface by "interface id" in ipnet_if_getby_index() would not
  necessitate a walk of the entire AVL tree.

* 30: Need a more descriptive intro. block comment.  What I would find
  helpful in such a comment would be a high-level description of the
  flow of packets.

* 70: IPNET_MATCHZONE isn't used anywhere.

* 72: IPNET_MATCHADDR isn't used anywhere.

* 81: I'm a bit confused by this macro.  It's called IPNETDBG, but
  it's used in all contexts (not just DEBUG kernels).  I think it
  would be more straightforward to get rid of this macro and simply
  call cmn_err() where appropriate in the code.

* 83: ipnet_input() should be static.

* 115: I don't think you need an ipnet_rsrv(), but perhaps I don't see
  what your thinking is.  Can you clarify the intent (perhaps in a
  comment)?

* 157: The uint16_t is the SAP, but what's ETHERADDRL for?  Since
  there's no physical media, then the physical address length should
  be 0.

* 170: Same as above: since there's no media and thus no concept of
  broadcast, the address length should be 0, and not ETHERADDRL.

* 175: Please resolve this XXX.

* 180: To keep coding style consistent, merge this line with the
  structure definition below it.

* 186,187: These two fields, ipnet_mode and ipnet_pflg, are separate
  flags fields, but they're using the set of flags defined at 75-79 (a
  single flags number space).  Please define a single field,
  ipnet_flags, where all flags are set.

* 188: ipnet_sap isn't used anywhere.  I would have expected it to be
  used for filtering based on SAP in the non promiscuous case, but
  that code seems to be missing.

* 189: ipnet_dlstate would make it more clear that this is meant to
  represent DLPI state.

* 199: ips_avl_by_name is populated and maintained, but isn't used
  anywhere; please remove it and all associated code.

* 205: Need comment what cb_set is.

* 207: You're always entering this lock as a writer, so this is
  essentially a mutex lock disguised as a rw lock.

* 219: ipnet_nicevent_cb should be static.

* 219-254: Please use a single indentation style throughout.

* 247: This in-line declaration of inet_ntop() isn't necessary, as you
  should be including <inet/ip6.h>.  In any case, there is no need to
  do that since you don't use inet_ntop() anywhere.

* 250-254: All three of these declarations should be removed and
  placed in a header file.

* 272: I think it should be pointed out in a comment that the choice
  of "1" as the "nthreads" argument to taskq_create() is fundamental
  to the design.  If I'm not mistaken, this guarantees in-order
  delivery of packets to clients.

* 291,463: This needs more thinking.  I don't think it's the correct
  thing to do to lock this module into memory like this even when
  there's nothing in the system that's using it.  If there are no
  consumers, then I'd let this thing detach and unload.  This would
  mean having to implement the correct semantics for tearing down
  state, which shouldn't be hard.

* 300: error handling by this function is undefined.  The function
  returns int, but always returns 0, and its return value is always
  ignored.  Either change it to either return void, or to return error
  codes on failure.

* 307,322: Make "ret" a single local variable.

* 313,314,328,329: Continuation lines: indent by 4 spaces.

* 319,334: ipnet_register_hook()

* 334: missing "failed".

* 340: Need intro. block comment.

* 354: I don't think we care about the netmask.  We do, however need
  the broadcast address, yet you don't get it directly.  You later
  calculate the broadcast address based on the netmask, but you could
  simply get the broadcast address directly with NA_BROADCAST.

* 358: There is no need to dynamically allocate "name".

* 365: Comment nit: net_getifname() doesn't return a pointer, and
  returns a non-zero error code on failure, not NULL.

* 374: Under what condition would you find an existing interface with
  this interface index?

* 374: This would be more readable as:

        ipnetif = ipnet_if_getby_index(phyif, ns);
        if (ipnetif == NULL) {
                /* ... */
        }

* 376: Likewise, this would be more readable as:

        ipnetif = ipnet_create_if(name, phyif,
            family, ns);
        if (ipnetif == NULL) {
                /* ... */
        }

* 377: You shouldn't need to pass in the address family since
  ipnetif_t is a "physical" interface.

* 384: The handling of this error is inconsistent with the spirit of
  the comment at 367, where you say that if you can't create a
  /dev/ipnet node for an interface, then you log a message and move on
  to the next one.  For what it's worth, I think that if you don't get
  a self-consistent and complete view of the interfaces, then you
  should tear everything down and fail the entire operation including
  the attach().

* 392,396: cstyle: Need curly braces.

* 423: Since you're returning DDI_FAILURE in error paths, you should
  be returning DDI_SUCCESS here.

* 418: Can't the IFF_UP flag be cleared on the actual interface
  between lines 401 and 418?  If so, how do you ensure that ifa_status
  is accurate?

* 433: No need for braces.

* 433: Need to return DDI_SUCCESS for DDI_RESUME.

* 437: There is code in this file that makes the assumption that if
  ipnet_dip is non-NULL, then the module is attached.  Given that you
  want to maintain that this is a valid assumption, then you need to
  move this line of code down to immediately above the return at 456.

* 441: It would help to explain (in the comment) how changes happening
  to the system during or immediately after this initial "world view"
  creation are going to be reflected in the module's state.  In other
  words, have we already registered to receive netinfo callbacks, and
  will receiving callbacks during this process work?

* 463: Need to return DDI_SUCCESS for DDI_SUSPEND.

* 503: No need foor braces.

* 526: You can pass-in NULL for "where" here and get rid of the
  "where" local variable.

* 529: You can prevent this kmem_free() by delaying the allocation of
  ipnet until after you've sucessfully found the node with avl_find().

* 530: No, that's what dtrace is for. :-) On a related note, is this
  even possible?

* 542: Comment needed: why is this being done here rather than in
  ipnet_stack_init()?  My interpretation of this is that you don't
  want the ipnet hook to send you packets if there are no open
  clients, so you only register in open(), and unregister in close().

* 567: Spurious blank line.

* 571: This comparison is a bit strange.  Why not simply (ipnet_curr
  == ipnet) ?

* 588: Move up to previous line.

* 589: More readable as "kmem_free(ipnet, sizeof (*ipnet));"

* 609: Shouldn't this line be up in the FLUSHW block above?

* 611: cstyle: Need braces around the else block.

* 624: I don't think you should be treating data like a boomerang here
  and sending it back.  M_DATA should be treated like the default case
  and freed.

* 621,668,692: I wouldn't bother supporting transparent ioctls.  You
  can simply require that user-land programs using DLIOCIPNETINFO do
  so using I_STR.

* 766: As I mentioned, you don't need to include 6 bytes of phys-addr
  nor 6 bytes of broadcast address (you never filled them in
  anyway...).

* 770: There should be code here to fill in the dl_addr (the bound
  SAP).

* 822: This doesn't really matter, but since 0 is a valid SAP value
  (one that will be bound 99.999% of the time with these devices),
  setting ipnet_sap to 0 to show that the ipnet_t is unbound doesn't
  really mean much.

* 855,889: I believe these both need to be atomic (atomic_inc_32() and
  atomic_dec_32()).

* 895: Returning an error here seems to be a bit useless since you've
  already cleared the IPNETALLMULTI flag and decremented if_multicnt.
  Perhaps waiting to clear the flag and decrementing the counter until
  after this block would be best.

* 924,950: One flaw with this design is that you'll never know if the
  DL_PROMISCON_REQ failed, as ip will consume the DL_ERROR_ACK.  I'm
  not sure what you can do about that, however, short of implementing
  a proper function in ip for enabling multicast promiscuous mode that
  waits for the DLPI acknowlegment prior to returning control to the
  caller.

* 969,1064,1078,1094: atomic_inc_32()

* 989: Need an intro. block comment detailing the accept logic.

* 994: cstyle: continuation line, indent by 4 spaces, and need braces.

* 1020: I don't think the source being multicast is a valid check (it
  is certainly not a valid packet, but that's irrelevant).

* 1020: Correct me if I'm wrong, but I believe the semantics for
  multicast should be that we only see those that are sent or received
  on the interface of interest.

* 1022: Same here with the broadcast source address check.  This check
  doesn't look like it should be here.

* 1025: This logic doesn't look right to me.  If the source matches
  and DL_PROMISC_MULTI is enabled, then we don't pass the packet up.

* 1045: You could cache the netstack pointer in the ipnet_t instead of
  doing this for every packet.

* 1048: You can reduce one level of indentation by changing 1046 to:

        if (ipnetif == NULL ||
            ipnetif->if_minor != ipnet->ipnet_if->if_minor)
                continue;

  On a related note, couldn't you simply compare the ipnetif pointer
  instead of its minor number?  Like this:

        if (ipnetif == NULL || ipnetif != ipnet->ipnet_if)
                continue;

* 1052: This is not correct for at least two reasons.  The first is
  that there is nothing here implying that what you're looking at is
  an Ethernet header.  The second is that the consumers of the hook
  should not be expected to keep state on whether IP's fastpath is
  enabled or disabled.  Hook callbacks should always be passed an IP
  header, and the code that calls the hook callback function should
  ensure that since it's the only thing that knows what kind of
  link-layer header it needs to toss away.

* 1055: cstyle: indent by 4.

* 1060: Either dup or copy, I don't think you're buying anything by
  trying both.

* 1070: cstyle: indent by 4.

* 1074: The logic here doesn't look right.  If we can do a putnext(),
  then we don't do a putnext() but enqueue the packet for later
  processing instead.  Why not do a putnext() in that case?  If the
  intent is to always enqueue (maybe because we drop too many packets
  otherwise?), then why bother with the canputnext() here, and defer
  that check for the service procedure to do.

* 1091: What are you referring to in this comment?

* 1092: Need a comment explaining why we're dispatching taskq's here.
  It would also help to add a quick note to pre-emptively answer the
  question, "if you're dispatching a taskq for each packet, won't this
  scheme consume all system resources under heavy load?"

* 1103: There should be no reason to not use all arguments in this
  function.  Please remove the ARGSUSED and remove any unused
  arguments (of which "family" is one of them).

* 1119: As I mentioned in my general ipnet.c comment above, I don't
  understand why you need to allocate a minor number here when you
  already have a unique interface id passed in as the second argument.

* 1137: if_dev isn't used anywhere, please remove.

* 1193: This is really a for loop.

* 1205: What is this id (comment please)?

* 1209: cstyle: no need for braces.

* 1286: This strcmp() is being done in 4 places.  Consider a macro
  that takes the hook_nic_event_t and tells you whether or not it's an
  IPv4 event or not to make the code more readable.

* 1328: This is more readable as:

        ipnetif = ipnet_if_getby_index(hn->hne_nic, ns);
        if (ipnetif != NULL) {
                /* ... */
        }

* 1333,1336: Perhaps I need to understand more about netinfo.  How can
  we receive an unplumb event if there are still addresses in the
  list?  If the interface is unplumbed, then how will our address list
  ever get emptied and our ipnetif destroyed?

* 1369: Please resolve this XXX.

* 1382: cstyle: indent by 4.

* 1387: cstyle: indent by 4.

* 1389-1394,1408-1413: Don't you need some locking here to prevent
  half-written addresses from being read elsewhere in the code?

* 1420-1452: I think all of this code can be removed.  You don't care
  about whether an address is up or down.  The whole concept of an
  address being up or down is odd to begin with, but that's a
  different matter entirely. :-)

* 1459-1462: None of these (net_ifaddr, addr, v4addr, and v6addr) are
  used anywhere, you can remove them and the code in this function
  that plays with them.

* 1478,1479: These are duplicates of 1470,1471.

* 1487-1505: Indented by one too many tabs.

* 1485-1505: You don't need this switch statement.  This can all be
  reduced to:

        ifaddr = ipnet_match_lif(ipnetif, hn->hne_lif, family);
        if (ifaddr != NULL) {
                bcopy(hn->hne_data, &ifaddr->ifa_zone,
                    sizeof (ifaddr->ifa_zone));
        }

* 1543: ipnet_if_getby_name() isn't used anywhere, please remove it.

* 1547: Why do you need to walk the ips_avl_by_index AVL tree here
  instead of simply doing an avl_find() on the ips_avl_by_name AVL
  tree?

* 1551: If I understand the implementation correctly, there are
  separate AVL trees for every netstack, so it would be a bug if two
  interfaces from different netstacks ended up in the same tree.  This
  looks like it should be an ASSERT().

* 1565: If ips_avl_by_index were indexed by interface id instead of
  your private number space, then you wouldn't need to walk the entire
  tree here.

* 1579: Indeed.  For one, you don't need these switch statements.  You
  also don't need separate locks for v4 and v6 address lists.  Given
  that, this entire function can be reduced to:

        mutex_enter(&ipnetif->if_addr_lock);
        list = (family == AF_INET) ?
            ipnetif->if_ip4addr_list : ipnetif->if_ip6addr_list;
        for (ifaddr = list_head(&list); ifaddr != NULL;
            ifaddr = list_next(&list, ifaddr)) {
                if (lid == ifaddr->ifa_id)
                        break;
        }
        mutex_exit(&ipnetif->if_addr_lock);
        return (ifaddr);

  You can make similar reductions in a number of different places.

* 1587,1601: You're not holding a lock here, so the head could be
  removed before you start walking the list, and kaboom.  It doesn't
  matter if you refactor the code as I suggested above.  The same
  comment goes for for a number of other places where you walk these
  lists.

* 1640: According to the code, it can only fail if you're erroneously
  called it more than once.  As such, I think it would be fine to use
  VERIFY(net_release(...) == 0) where appropriate and forget the idea
  of handling errors from this function.

* 1653: Consider VERIFY(net_unregister_hook(...) == 0) here as well.
  This can only fail if there's something seriously wrong with either
  your code or the netinfo code.

* 1677-1712: You can reduce the clutter here by having a single mutex
  lock for both lists, grabing it once at the start, and releasing it
  once at the end before a single return() statement at the end of the
  function.  Same comment for ipnet_matchbaddr().

* 1684: One too many parentheses.

* 1685-1688: cstyle: the indentation in this block looks messed-up.

* 1699: Use IN6_ARE_ADDR_EQUAL().

* 1726: cstyle: indent by 4.

* 1734: There is no such thing as a broadcast address in IPv6.  What
  does this code do?

* 1757: ipnet_matchmaddr() is empty and isn't used anywhere.  I have a
  feeling that you need to implement and use it given that the
  existing multicast logic in ipnet_accept() needs work.

* 1770-1774: I don't follow this comment.  It accurately describes the
  code, but what does it mean in terms of shared vs. exclusive stacks?

* 1793: no need for braces.

* 1801: ipnet_walk_if() isn't used anywhere, please remove.


usr/src/uts/common/inet/lo/lo.c

* 133,146: Do you really need non-zero length physical addresses?

* 208: What about DDI_RESUME?

* 224: What about DDI_SUSPEND?

* 335-342: Please verify this M_FLUSH logic, I can't tell if it's
  correct.

* 375,399-422: I would toss transparent ioctls overboard as useless
  complexity.

* 470: You account for the size of a fictitious physical address and
  broadcast address, but you never fill them in.  I'd reccommend not
  including them at all in the size calculation.

* 432,444-446,482-490: There is no style-2 /dev/lo, so DL_ATTACH_REQ
  and DL_DETACH_REQ would be an error.

* 504: We don't do anything with this sap.  Is that by design?

* 510: Comment please: why does this function do nothing?

* 528: Someone could come in here and change the list while you're
  walking it.

* 529: This check is in the wrong place.  You want to drop the event
  entirely if it isn't IPNET_HOOK_LOCAL.  You don't need to do this
  check for each lostr.

* 536: No need to fall-back to copymsg() IMO.

* 538: Incrementing lo_drops should be atomic.

* 562: Security bug: it looks like a process in one zone can see
  packets from any other zone.


usr/src/uts/common/inet/tcp/tcp.c

* 18066: cstyle: need braces.

* 19680: It seems to me that for performance reasons, you should only
  do this IRE cache lookup if there are callbacks registered.  You
  check for that condition in ip.c, but not here.

* 19683: cscylt: no need for braces.


usr/src/uts/common/inet/tcp/tcp_fusion.c

* 196-205: You removed the ASSERT() here (why?) and left the block
  comment.  The comment now looks strage by itself with no context.
  Can you rectify this?


usr/src/uts/common/inet/udp/udp.c

* 6360: Performance nit: Should you be checking for the existence of
  callbacks before doing the IRE cache lookup?


usr/src/uts/common/io/neti.c

* 407,417: The ASSERT() calls here are silly.  The system will panic
  dereferencing a NULL pointer on the next line of code with a
  perfectly clear panic message.  The ASSERT() doesn't buy us
  anything, but then again, the same ASSERT() is everywhere in this
  file.


usr/src/uts/common/os/priv_defs

* 230: This implies that /dev/lo0 is a directory with devices in it.
  Suggest rewording to, "Allows a process to access /dev/lo0 and the
  devices in /dev/net while not ..."


usr/src/uts/common/sys/dlpi.h

* 61-68: This does not belong in <sys/dlpi.h> for the same reason that
  the definition of an Ethernet header does not belong there.  This
  should go in <inet/ipnet.h>.


usr/src/uts/i86pc/os/startup.c

* What was the purpose of the change in this file?

-Seb



Reply via email to