> RCM:                    Meem
 > 
 > External webrev:
 > 
 >        http://cr.grommit.com/~yun/webrev_uv_09_28
 > 
 > Internal webrev:
 > 
 > http://jurassic.sfbay/net/forwar.prc/builds/yz147064/clearview-uv-review/webrev_review/
 >  
 > 
 > Cscope:
 > 
 >        /net/forwar.prc/builds/yz147064/clearview-uv-review/

Here's my feedback on everything except network_rcm.c (which I'm still
looking at).  Despite the volume of my feedback, I found the code to be
quite good -- most of my feedback consists of nits, and even several of
the more substantive issues exist in other RCM modules.  Still, I'm a
picky bastard, and you asked for my review, so this is what you get ;-)

General comments:

        * The RCM namespace of SUNW_network/<linkid> seems a bit generic
          and hard to understand -- perhaps SUNW_network could be
          SUNW_datalink?  Also, I'm a bit concerned about the use of link
          identifiers here, since e.g. cfgadm will output them (right edge
          truncated to avoid 80-column wrap):

          cfgadm: Component system is busy, try again:
                         Resource                               Info
          --------------------------------------  ------------------
          /devices/pci at 9,700000/pci at 2/network at 0   Network interface 
          SUNW_network/1                          ce0 AGGRs: aggr0
          SUNW_network/2                          aggr0 hosts IP add
          SUNW_ip/10.8.57.204                     Plumbed IP Address
           
          What would be the impact of using link names instead?

          If (for some reason) linkids must be used at the link layer,
          would it be possible to use link names at a higher level?  I'm
          really uncomfortable with e.g. ip_rcm working with linkids.

        * Similar to the above: please be sure that link identifiers do
          not appear in diagnostic messages the system administrator could
          see.  (If possible, I'd prefer to avoid link identifiers in all
          diagnostic messages).

        * I know it's common practice in the RCM code to assert() that the
          input arguments are correct, but I cannot see the use in this.
          For instance, code like this is just tedious:

          static int
          vlan_resume(rcm_handle_t *hd, char *rsrc, id_t id, uint_t flags,
              char **errorp, rcm_info_t ** depend_info)
          {
                  /* Guard against bad arguments */
                  assert(hd != NULL);
                  assert(rsrc != NULL);
                  assert(id == (id_t)0);
                  assert(errorp != NULL);
                  assert(depend_info != NULL);
          
                  rcm_log_message(RCM_TRACE1, _("VLAN: resume(%s)\n"), rsrc);
          
                  return (RCM_SUCCESS);
          }

          Please remove all of these asserts -- the modules should be able
          to trust the framework to pass them correct arguments.

        * There is no need to internationalize DEBUG and TRACE messages
          (and you'll note that other modules like ip_rcm do not bother).
          So, please remove the _()'s from the various DEBUG and TRACE
          rcm_log_message() invocations.

        * Speaking of messages, your implementation of the aggr_log_err()
          and vlan_log_err() functions already prefix everything with
          AGGR: or VLAN:, so there's no need to include those strings in
          the calls to those functions.

        * It's really a shame that every RCM module has its own
          implementation of cache_insert(), cache_remove() and so forth.
          But that's well-established practice at this point, so I guess
          I'll just have to live with it :-O

        * A number of functions have comments stating "Call with
          cache_lock held".  I'd prefer the comment be removed and an
          assert() added to the function in its place.  (I've also noted
          the places I've found where we call them without holding
          cache_lock in the specific comments below.)

        * Many references to "succeed" should be "succeeded" (e.g.,
          "VLAN: offline succeeded").

        * Are we sure it's safe to call *_consumer_offline() while holding
          cache_lock?  Seems like it should be OK, but I'm slightly
          nervous about calling other modules (via rcm_request_offline())
          while holding locks.

        * Occasionally, the vlan_cache and aggr_cache structure variables
          use the name `probe' rather than `node' -- why?  What does
          `probe' mean in this context?

vlan_rcm.c:

        * Please replace all use of "device" with "link" or "VLAN" as
          appropriate.

        * Would be nice to provide some high-level comments which explain
          the code flow when a link with VLANs is brought online or
          offline.  I'd be happy to work with you on these.

        * The `vlan_cache' structure seems misnamed -- this appears to
          represent a link, not a VLAN.  Likewise, it seems like
          CACHE_VLAN_* should really be CACHE_LINK_*.

        * On a related note: some of the functions have confusing names.
          For instance, the oddly-named vlan_offline_vlan() is really
          semantically link_offline_vlan() (but e.g. vlan_update() is
          properly named).

        * 68: dv_vlanid seems like a confusing name, since it's not the
          VLAN id on the wire.  Seems like this should be dv_linkid
          instead (see following comment).

        * 69: dv_linkid appears to be set but never used.

        * 70: dv_vid appears to be set but never used.

        * 71: dv_force appears to be set but never used.

        * 73: Could we consistently use an enum for flags, rather than
          using a combination of uint32_t and bitfields?  (enums are
          debugger-friendly.)

        * 86-87: Move `{' onto previous line as per cstyle rules.

        * 172: Second parameter should be 0, not NULL.

        * 197: Unclear how this cache_free() would ever have any work to
          do, since if vlan_register() was never called the cache will be
          empty, and if it was called, then upon unregistration
          vlan_unregister() will have already emptied the cache.  (And if
          items were left in the cache, is it safe to remove them without
          calling rcm_unregister_interest()?)  If the cache_free() call is
          needless, then the cache_free() routine can be removed as well.

        * 257: Would be useful to print an RCM_ERROR diagnostic if
          rcm_unregister_interest() fails.

        * 268: s/Need to unregister/Unregister/

        * 311: Assuming this errno assignment is actually needed, it
          should be done after the call to mutex_unlock() to ensure
          mutex_unlock() doesn't change it.

        * 317, 398: s/on associated/of associated/

        * 338: This message seems unnecessary given line 347.
          
        * 373, 522, 736, 800, 1348, 1358, 1436: Remove extra blank line.

        * 390: Not sure I see the need to report a debug message when
          vlan_online_vlan() succeeds.

        * 405: vlan_consumer_online() failures deserve at least an
          RCM_WARNING, not RCM_DEBUG.

        * 416, 1194: Can you explain the need for the `break_on_failure'
          flag?  Why not always try to online all VLANs?

        * 422: Comment seems a bit strange -- a comment before the loop
          of "Recreate all VLANs used by `node'" seems more fitting.

        * 424-444: Indentation would be cleaner with:

                if (!(vlan->dv_flags & VLAN_OFFLINED))
                        continue;

        * 441: Would be simpler to `return (RCM_FAILURE)' here, remove
          lines 446-448, and change line 450 to `return (RCM_SUCCESS)'

        * 425-426: This function is sufficiently small that the nested
          declarations add little value; please hoist to the top of the
          function.

        * 432, 1428: s/Prompt the warning/Print a warning/

        * 443: Why is VLAN_OFFLINED cleared even if the dladm_vlan_up()
          call failed?

        * 454: Based on the callers, seems like vlan_offline_vlan() should
          also take the CACHE_* state flag to set on node->vc_state.

        * 464-466: Comment seems a bit strange -- a comment before the
          loop of "Delete all VLANs used by `node'" seems more fitting.

        * 467-481, 1323-1335: Indentation would be cleaner with:

                if (vlan->dv_implicit)
                        continue;

        * 474: Would be simpler to `return (RCM_FAILURE)' here, remove
          lines 484-486, and change line 487 to `return (RCM_SUCCESS)'

        * 513, 648, 877: Explicitly test against NULL.

        * 521: Why not directly assign to *usagep?

        * 522: Can we drop cache_lock at this point instead? 

        * 529: Why is *errorp set to NULL?

        * 572: Stray space between `**' and `depend_info'.

        * 602, 1203, 1252: Would simplify code to use alloca() instead.

        * 610: Under what situation do we get here without
          VLAN_CONSUMER_OFFLINED being set?  (When some force mode is
          enabled?)  It seems like this would mean that we're about to
          remove a link that still has consumers actively using one or
          more of its VLANs, which would be problematic.

        * 614: If rcm_notify_remove() fails part-way through, what
          restores the state of the VLANs that were removed?

        * 620, 1229, 1275: Remove unused `done:' labels.

        * 635: Needless initialization of `rv'.

        * 660: The ordering here seems strange -- shouldn't we defer the
          removal of the node until after the consumers have agreed on
          the remove?

        * 661: This call to node_free() is done without holding cache_lock.

        * 691, 699, 709, 719, 730: Given that vlan_log_err() calls
          rcm_log_message(), why do we need to explicitly call
          rcm_log_message()?  If it's just to provide the linkid, seems
          like that could be addressed by changing vlan_log_err() to take
          a linkid rather than a `node'.

        * 705: Indenting would be cleaner here if you did:

                if (strcmp(nvpair_name(nvp), RCM_NV_LINKID) != 0)
                        continue;

        * 721, 732: Why not continue and try to bring up the other VLANs?

        * 785: First part of this expression has an extra set of parens.

        * 793: bzero() here seems needless.

        * 794: Prefer snprintf().

        * 817, 819: Prefer strlcat().

        * 818, 820: Remove needless braces.

        * 855-864, 1066-1071: These would be more natural as `for' loops
          -- though to keep the formatting decent, you may need to do
          something like:

          probe = cache_head.vc_next;
          for (; probe != &cache_tail; probe = probe->vc_next) {
                /* ... */
          }

        * 857: How can `vc_resource' be NULL here?

        * 878: Needless comparison against NULL.

        * 882: Needless assignment of `vlan'.

        * 928: Needless ARGSUSED.

        * 932, 1405: No need to cast `arg'.

        * 936: No need to initialize `rsrc'.

        * 948, 1415: s/infomation/information/ -- also, perhaps it
          would be clearer as "cannot get vlan information for"?

        * 975, 1002: Remove unhelpful comments.

        * 992, 1022, 1131: Remove needless parens.

        * 1004: This doesn't seem safe: we've already inserted `probe'
          into the cache on line 987 -- and it has a reference to `rsrc'.

        * 1022: What's the harm in just always clearing CACHE_VLAN_STALE?
          Seems like `update_stale_state' isn't needed.

        * 1042-1044: `vlan_update_arg_t arg = { hd, 0 };' would be simpler.

        * 1085-1101: This seems more natural as
          for (vlan = probe->vc_vlan; vlan != NULL; vlan = next)

        * 1102: Seems like it would be simpler to introduce a `prnext'
          variable and assign `prnext = probe->vc_next' here, then remove
          lines 1110, 1117, and 1133.  It would also allow `freeit' to
          be eliminated.

        * 1164: As per previous comments, seems like vlan_log_err() should
          take a linkid rather than a node (and print a link name).  Also,
          a format like:

                VLAN: vlan2: cannot offline consumers

          ... seems preferable to the current:

                VLAN: cannot offline consumers(vlan2)

          ... but I realize the format you chose is well-established
          practice with other RCM modules.

        * 1180: Remove needless cast.  Also, it's unclear why calloc() is
          being used.

        * 1191: Remove "/* ARRGSUSED */" (the extra R wasn't causing it to
          do anything anyway ;-)'

        * 1245: No need to initialize `vlan'.

        * 1281: I don't quite understand this XXX comment; if this is a new
          VLAN, shouldn't we have already passed it along to consumers via
          vlan_notify_event(link) -> vlan_configure(link) -> vlan_up(vlan)
          -> dladm_vlan_up(vlan) -> vlan_notify_event(vlan)?  If not,
          there seems to be a hole here in notifying consumers about
          new VLANs.

        * 1295: No need for `ret'.

        * 1328: s/Notify the RCM_RESOURCE_LINK_NEW event/
                  Send RCM_RESOURCE_LINK_NEW events/

        * 1363: Why `cached_name' rather than `rsrc'?

        * 1381: s/consumers the new link/consumers of the new link/

        * 1406, 1420-1424: Seems like it would be simpler to do:

                if (vlan_attr.dv_linkid != vlan_up_argp->linkid)
                        return (DLADM_WALK_CONTINUE);

        * 1425: Shouldn't this message be at the top of the function?

        * 1461: CACHE_REFRESH is passed here, but it's ignored since
          the handle is NULL.  But given that this codepath is only
          reached via vlan_notify_event(), and that's already done a
          cache_update(), it's probably OK to just pass CACHE_NO_REFRESH,
          and then remove the hd != NULL check.

aggr_rcm.c:

        * As with vlan_rcm.c, it would be nice to provide some high-level
          comments which explain the code flow when an aggregation is
          brought online or offline.  I'd be happy to work with you on
          these.

        * Also as with vlan_rcm.c, the `aggr_cache' structure seems
          misnamed -- this represents a link using an aggregation,
          not an aggregation itself.

        * 61: Seems like dl_portinfo_t isn't really necessary, and that
          we could remove it and change da_ports to be:

                datalink_id_t              *da_portids;

        * 75: da_policy appears to be set but never used.

        * 76: da_mac appears to be set but never used.

        * 77: da_lacp_mode appears to be set but never used.

        * 78: da_lacp_timer appears to be set but never used.

        * 79: da_mac_fixed appears to be set but never used.

        * 80: da_force appears to be set but never used.

        * 97-98: Move `{' onto previous line as per cstyle rules.

        * 188, 193: Second parameter should be 0, not NULL.

        * 221-227: As with vlan_rcm.c:197, it's unclear how this would
          ever have any work to do -- and if it did have work to do, it
          seems like it would need to call rcm_unregister_interest().

        * 230: Likewise, this call to aggr_list_free() seems unnecessary
          given the call to aggr_list_free in aggr_unregister().

        * 333: Would be useful to print an RCM_ERROR diagnostic if
          rcm_unregister_interest() fails.

        * 348: s/Need to unregister/Unregister/

        * 392: Assuming this errno assignment is actually needed, it
          should be done after the call to mutex_unlock() to ensure
          mutex_unlock() doesn't change it.

        * 403, 525, 572, 1404: Given the descriptive name of the function,
          the is_only_port variable seems needless -- just inline the calls
          into the `if' statements.

        * 425, 478: These messages seem unnecessary.

        * 461, 635, 1322, 1370, 1429, 1548: Remove extra blank line.

        * 491: aggr_consumer_online() failures deserve at least an
          RCM_WARNING, not RCM_DEBUG.

        * 494, 497: Should this be "IP/VLAN" consumers?  (Or perhaps we
          should just refrain from guessing what the consumers might be.)

        * 508: Based on the callers, seems like aggr_offline_port() would
          benefit from taking a CACHE_* state flag to set on node->vc_state.

        * 514, 558: Would be slightly simpler to declare a normal variable
          rather than an array with one member -- e.g.:

                port.lp_linkid = node->vc_linkid;
                status = dladm_aggr_remove(aggr->da_aggrid, 1, &port,
                    DLADM_OPT_ACTIVE);

        * 564: Indentation would be cleaner with:

                if (!(node->vc_state & CACHE_AGGR_PORT_OFFLINED))
                        return (RCM_SUCCESS);

        * 577: What's the rationale behind setting *up to B_TRUE even
          if dladm_aggr_up() fails?

        * 588, 1536: s/Prompt the warning/Print a warning/

        * 626, 723: Explicitly test against NULL.

        * 634: Why not directly assign to *usagep?

        * 642: Why is *errorp set to NULL?

        * 655: Seems like we can drop cache_lock after the call to
          aggr_usage().

        * 685: Stray space between `**' and `depend_info'.

        * 610: Under what situation do we get here without
          CAHCE_AGGR_CONSUMER_OFFLINED being set?  As with VLANs, it 
          seems odd to remove a link that still has consumers actively
          using it.

        * 736: Don't we need to hold cache_lock here to safely access
          vc_aggr?  (If not, why is it needed at line 1401?)

        * 737: assert() seems needless; we'll crash two lines later.

        * 738: What does the name `exported' mean here?

        * 741: The ordering here seems strange -- shouldn't we defer the
          removal of the node until after the consumers have agreed on
          the remove?

        * 749: This call to node_free() is done without holding cache_lock.

        * 778, 787, 797, 806, 816: Given that aggr_log_err() calls
          rcm_log_message(), why do we need to explicitly call
          rcm_log_message()?  If it's just to provide the linkid, seems
          like that could be addressed by changing aggr_log_err() to take
          a linkid rather than a `node'.

        * 795: Indenting would be cleaner here if you did:

                if (strcmp(nvpair_name(nvp), RCM_NV_LINKID) != 0)
                        continue;

        * 852: The plural "AGGRs" seems misleading, since a port can only
          be in one aggregation.  Maybe a message like "ce0 is part of
          aggregation aggr0" would be clearer?  (Or, if we'll only output
          this message when ce0 is the last port in the aggregation, we
          could be more explicit.)

        * 862: No separators are used here.

        * 863: Why is LINKID_STR_WIDTH used here?  We output a link name,
          so it seems like it should be DLPI_LINKNAME_MAX.

        * 870: bzero() here seems needless.

        * 871: Prefer snprintf().

        * 889: Prefer strlcat().

        * 916: cache_lookup() is never called with hd == NULL.

        * 925:  How can `vc_resource' be NULL here?
        * 925-932, 1168-1172, 1175-1179: These would be more natural 
          as `for' loops.

        * 943: assert() seems needless; we'll crash on the next line.

        * 976: aggr_port_update() needs a comment (or better yet an
          assert()) that cache_lock is held.
        
        * 1005, 1122, 1215: Remove needless parens.
          
        * 1021: Remove needless comment.

        * 1049: Needless ARGSUSED.

        * 1053, 1496: No need to cast arg (and in the latter case, the
          assignment can be hoisted to line 1488).

        * 1069: s/infomation/information -- also, perhaps it would be
          clearer as "cannot get aggr information for"?

        * 1100: If this calloc() fails and `exist' is true, we end up
          leaving an aggr in aggr_head with a NULL da_ports which will
          e.g. later cause problems in aggr_is_only_port().  Same with the
          logic on lines 1115-1117.  If `exist' is false, then we end up
          freeing aggr even though it's currently inserted on the list via
          the aggr_list_insert() on line 1092.

        * 1121: Seems simpler to always clear AGGR_STALE.

        * 1128: No need to check if aggr != NULL.

        * 1130, 1551: The notion that dladm_aggr_info() partially returns
          dynamically-allocated memory is ugly (though this is not related
          to you changes ;-).

        * 1145-1147: `aggr_update_arg_t arg = { hd, 0 };' would be simpler.

        * 1188: Seems like it would be simpler to introduce a `prnext'
          variable and assign `prnext = probe->vc_next' here, then remove
          lines 1195, 1202, and 1217.  It would also allow `freeit' to
          be eliminated.

        * 1224: s/clear/delete/

        * 1244: As per previous comments, seems like aggr_log_err() could
          take a linkid rather than a node (and print a link name).  Same
          comments also apply on the format of the messages.

        * 1260: Remove needless cast.  Also, it's unclear why calloc() is
          being used.

        * 1278-1281: Would be simpler as:

          return (aggr->da_nports == 1 && aggr->da_ports->dpi_portid == portid);
 
        * 1293, 1338: `aggr' variable seems of limited use.

        * 1294, 1339: No need to initialize `rsrc'.

        * 1301, 1352: Would simplify code to use alloca() instead.

        * 1314-1323, 1362-1371: No real need for the `goto done' here,
          especially if you change the code to use alloca().

        * 1378: s/Notify the RCM_RESOURCE_LINK_NEW event/
                  Send RCM_RESOURCE_LINK_NEW events/

        * 1443: Why `cached_name' rather than `rsrc'?

        * 1501: I'm a bit confused on the semantics of `retval' -- seems
          like it provides the status of the last aggregation to be
          examined, which doesn't seem like what we want.

        * 1553: If the configuration was successful and a port can only
          be in one aggregation, why do we return DLADM_WALK_CONTINUE?

ip_rcm.c:

        * Since this is IP configuration, it seems strange (if not simply
          wrong) to have link identifiers here.  Is there a specific
          reason why link/interface names can no longer be used?

        * 822: Remove extra blank line.

        * 1470: `ifname' seems more appropriate than `link' here.

        * 1937-1940: When would `link' be NULL here?

--
meem

Reply via email to