> RCM: Meem
>
> External webrev:
>
> http://cr.grommit.com/~yun/webrev_uv_09_28
>
> Internal webrev:
>
> http://jurassic.sfbay/net/forwar.prc/builds/yz147064/clearview-uv-review/webrev_review/
>
>
> Cscope:
>
> /net/forwar.prc/builds/yz147064/clearview-uv-review/
Here's my feedback on everything except network_rcm.c (which I'm still
looking at). Despite the volume of my feedback, I found the code to be
quite good -- most of my feedback consists of nits, and even several of
the more substantive issues exist in other RCM modules. Still, I'm a
picky bastard, and you asked for my review, so this is what you get ;-)
General comments:
* The RCM namespace of SUNW_network/<linkid> seems a bit generic
and hard to understand -- perhaps SUNW_network could be
SUNW_datalink? Also, I'm a bit concerned about the use of link
identifiers here, since e.g. cfgadm will output them (right edge
truncated to avoid 80-column wrap):
cfgadm: Component system is busy, try again:
Resource Info
-------------------------------------- ------------------
/devices/pci at 9,700000/pci at 2/network at 0 Network interface
SUNW_network/1 ce0 AGGRs: aggr0
SUNW_network/2 aggr0 hosts IP add
SUNW_ip/10.8.57.204 Plumbed IP Address
What would be the impact of using link names instead?
If (for some reason) linkids must be used at the link layer,
would it be possible to use link names at a higher level? I'm
really uncomfortable with e.g. ip_rcm working with linkids.
* Similar to the above: please be sure that link identifiers do
not appear in diagnostic messages the system administrator could
see. (If possible, I'd prefer to avoid link identifiers in all
diagnostic messages).
* I know it's common practice in the RCM code to assert() that the
input arguments are correct, but I cannot see the use in this.
For instance, code like this is just tedious:
static int
vlan_resume(rcm_handle_t *hd, char *rsrc, id_t id, uint_t flags,
char **errorp, rcm_info_t ** depend_info)
{
/* Guard against bad arguments */
assert(hd != NULL);
assert(rsrc != NULL);
assert(id == (id_t)0);
assert(errorp != NULL);
assert(depend_info != NULL);
rcm_log_message(RCM_TRACE1, _("VLAN: resume(%s)\n"), rsrc);
return (RCM_SUCCESS);
}
Please remove all of these asserts -- the modules should be able
to trust the framework to pass them correct arguments.
* There is no need to internationalize DEBUG and TRACE messages
(and you'll note that other modules like ip_rcm do not bother).
So, please remove the _()'s from the various DEBUG and TRACE
rcm_log_message() invocations.
* Speaking of messages, your implementation of the aggr_log_err()
and vlan_log_err() functions already prefix everything with
AGGR: or VLAN:, so there's no need to include those strings in
the calls to those functions.
* It's really a shame that every RCM module has its own
implementation of cache_insert(), cache_remove() and so forth.
But that's well-established practice at this point, so I guess
I'll just have to live with it :-O
* A number of functions have comments stating "Call with
cache_lock held". I'd prefer the comment be removed and an
assert() added to the function in its place. (I've also noted
the places I've found where we call them without holding
cache_lock in the specific comments below.)
* Many references to "succeed" should be "succeeded" (e.g.,
"VLAN: offline succeeded").
* Are we sure it's safe to call *_consumer_offline() while holding
cache_lock? Seems like it should be OK, but I'm slightly
nervous about calling other modules (via rcm_request_offline())
while holding locks.
* Occasionally, the vlan_cache and aggr_cache structure variables
use the name `probe' rather than `node' -- why? What does
`probe' mean in this context?
vlan_rcm.c:
* Please replace all use of "device" with "link" or "VLAN" as
appropriate.
* Would be nice to provide some high-level comments which explain
the code flow when a link with VLANs is brought online or
offline. I'd be happy to work with you on these.
* The `vlan_cache' structure seems misnamed -- this appears to
represent a link, not a VLAN. Likewise, it seems like
CACHE_VLAN_* should really be CACHE_LINK_*.
* On a related note: some of the functions have confusing names.
For instance, the oddly-named vlan_offline_vlan() is really
semantically link_offline_vlan() (but e.g. vlan_update() is
properly named).
* 68: dv_vlanid seems like a confusing name, since it's not the
VLAN id on the wire. Seems like this should be dv_linkid
instead (see following comment).
* 69: dv_linkid appears to be set but never used.
* 70: dv_vid appears to be set but never used.
* 71: dv_force appears to be set but never used.
* 73: Could we consistently use an enum for flags, rather than
using a combination of uint32_t and bitfields? (enums are
debugger-friendly.)
* 86-87: Move `{' onto previous line as per cstyle rules.
* 172: Second parameter should be 0, not NULL.
* 197: Unclear how this cache_free() would ever have any work to
do, since if vlan_register() was never called the cache will be
empty, and if it was called, then upon unregistration
vlan_unregister() will have already emptied the cache. (And if
items were left in the cache, is it safe to remove them without
calling rcm_unregister_interest()?) If the cache_free() call is
needless, then the cache_free() routine can be removed as well.
* 257: Would be useful to print an RCM_ERROR diagnostic if
rcm_unregister_interest() fails.
* 268: s/Need to unregister/Unregister/
* 311: Assuming this errno assignment is actually needed, it
should be done after the call to mutex_unlock() to ensure
mutex_unlock() doesn't change it.
* 317, 398: s/on associated/of associated/
* 338: This message seems unnecessary given line 347.
* 373, 522, 736, 800, 1348, 1358, 1436: Remove extra blank line.
* 390: Not sure I see the need to report a debug message when
vlan_online_vlan() succeeds.
* 405: vlan_consumer_online() failures deserve at least an
RCM_WARNING, not RCM_DEBUG.
* 416, 1194: Can you explain the need for the `break_on_failure'
flag? Why not always try to online all VLANs?
* 422: Comment seems a bit strange -- a comment before the loop
of "Recreate all VLANs used by `node'" seems more fitting.
* 424-444: Indentation would be cleaner with:
if (!(vlan->dv_flags & VLAN_OFFLINED))
continue;
* 441: Would be simpler to `return (RCM_FAILURE)' here, remove
lines 446-448, and change line 450 to `return (RCM_SUCCESS)'
* 425-426: This function is sufficiently small that the nested
declarations add little value; please hoist to the top of the
function.
* 432, 1428: s/Prompt the warning/Print a warning/
* 443: Why is VLAN_OFFLINED cleared even if the dladm_vlan_up()
call failed?
* 454: Based on the callers, seems like vlan_offline_vlan() should
also take the CACHE_* state flag to set on node->vc_state.
* 464-466: Comment seems a bit strange -- a comment before the
loop of "Delete all VLANs used by `node'" seems more fitting.
* 467-481, 1323-1335: Indentation would be cleaner with:
if (vlan->dv_implicit)
continue;
* 474: Would be simpler to `return (RCM_FAILURE)' here, remove
lines 484-486, and change line 487 to `return (RCM_SUCCESS)'
* 513, 648, 877: Explicitly test against NULL.
* 521: Why not directly assign to *usagep?
* 522: Can we drop cache_lock at this point instead?
* 529: Why is *errorp set to NULL?
* 572: Stray space between `**' and `depend_info'.
* 602, 1203, 1252: Would simplify code to use alloca() instead.
* 610: Under what situation do we get here without
VLAN_CONSUMER_OFFLINED being set? (When some force mode is
enabled?) It seems like this would mean that we're about to
remove a link that still has consumers actively using one or
more of its VLANs, which would be problematic.
* 614: If rcm_notify_remove() fails part-way through, what
restores the state of the VLANs that were removed?
* 620, 1229, 1275: Remove unused `done:' labels.
* 635: Needless initialization of `rv'.
* 660: The ordering here seems strange -- shouldn't we defer the
removal of the node until after the consumers have agreed on
the remove?
* 661: This call to node_free() is done without holding cache_lock.
* 691, 699, 709, 719, 730: Given that vlan_log_err() calls
rcm_log_message(), why do we need to explicitly call
rcm_log_message()? If it's just to provide the linkid, seems
like that could be addressed by changing vlan_log_err() to take
a linkid rather than a `node'.
* 705: Indenting would be cleaner here if you did:
if (strcmp(nvpair_name(nvp), RCM_NV_LINKID) != 0)
continue;
* 721, 732: Why not continue and try to bring up the other VLANs?
* 785: First part of this expression has an extra set of parens.
* 793: bzero() here seems needless.
* 794: Prefer snprintf().
* 817, 819: Prefer strlcat().
* 818, 820: Remove needless braces.
* 855-864, 1066-1071: These would be more natural as `for' loops
-- though to keep the formatting decent, you may need to do
something like:
probe = cache_head.vc_next;
for (; probe != &cache_tail; probe = probe->vc_next) {
/* ... */
}
* 857: How can `vc_resource' be NULL here?
* 878: Needless comparison against NULL.
* 882: Needless assignment of `vlan'.
* 928: Needless ARGSUSED.
* 932, 1405: No need to cast `arg'.
* 936: No need to initialize `rsrc'.
* 948, 1415: s/infomation/information/ -- also, perhaps it
would be clearer as "cannot get vlan information for"?
* 975, 1002: Remove unhelpful comments.
* 992, 1022, 1131: Remove needless parens.
* 1004: This doesn't seem safe: we've already inserted `probe'
into the cache on line 987 -- and it has a reference to `rsrc'.
* 1022: What's the harm in just always clearing CACHE_VLAN_STALE?
Seems like `update_stale_state' isn't needed.
* 1042-1044: `vlan_update_arg_t arg = { hd, 0 };' would be simpler.
* 1085-1101: This seems more natural as
for (vlan = probe->vc_vlan; vlan != NULL; vlan = next)
* 1102: Seems like it would be simpler to introduce a `prnext'
variable and assign `prnext = probe->vc_next' here, then remove
lines 1110, 1117, and 1133. It would also allow `freeit' to
be eliminated.
* 1164: As per previous comments, seems like vlan_log_err() should
take a linkid rather than a node (and print a link name). Also,
a format like:
VLAN: vlan2: cannot offline consumers
... seems preferable to the current:
VLAN: cannot offline consumers(vlan2)
... but I realize the format you chose is well-established
practice with other RCM modules.
* 1180: Remove needless cast. Also, it's unclear why calloc() is
being used.
* 1191: Remove "/* ARRGSUSED */" (the extra R wasn't causing it to
do anything anyway ;-)'
* 1245: No need to initialize `vlan'.
* 1281: I don't quite understand this XXX comment; if this is a new
VLAN, shouldn't we have already passed it along to consumers via
vlan_notify_event(link) -> vlan_configure(link) -> vlan_up(vlan)
-> dladm_vlan_up(vlan) -> vlan_notify_event(vlan)? If not,
there seems to be a hole here in notifying consumers about
new VLANs.
* 1295: No need for `ret'.
* 1328: s/Notify the RCM_RESOURCE_LINK_NEW event/
Send RCM_RESOURCE_LINK_NEW events/
* 1363: Why `cached_name' rather than `rsrc'?
* 1381: s/consumers the new link/consumers of the new link/
* 1406, 1420-1424: Seems like it would be simpler to do:
if (vlan_attr.dv_linkid != vlan_up_argp->linkid)
return (DLADM_WALK_CONTINUE);
* 1425: Shouldn't this message be at the top of the function?
* 1461: CACHE_REFRESH is passed here, but it's ignored since
the handle is NULL. But given that this codepath is only
reached via vlan_notify_event(), and that's already done a
cache_update(), it's probably OK to just pass CACHE_NO_REFRESH,
and then remove the hd != NULL check.
aggr_rcm.c:
* As with vlan_rcm.c, it would be nice to provide some high-level
comments which explain the code flow when an aggregation is
brought online or offline. I'd be happy to work with you on
these.
* Also as with vlan_rcm.c, the `aggr_cache' structure seems
misnamed -- this represents a link using an aggregation,
not an aggregation itself.
* 61: Seems like dl_portinfo_t isn't really necessary, and that
we could remove it and change da_ports to be:
datalink_id_t *da_portids;
* 75: da_policy appears to be set but never used.
* 76: da_mac appears to be set but never used.
* 77: da_lacp_mode appears to be set but never used.
* 78: da_lacp_timer appears to be set but never used.
* 79: da_mac_fixed appears to be set but never used.
* 80: da_force appears to be set but never used.
* 97-98: Move `{' onto previous line as per cstyle rules.
* 188, 193: Second parameter should be 0, not NULL.
* 221-227: As with vlan_rcm.c:197, it's unclear how this would
ever have any work to do -- and if it did have work to do, it
seems like it would need to call rcm_unregister_interest().
* 230: Likewise, this call to aggr_list_free() seems unnecessary
given the call to aggr_list_free in aggr_unregister().
* 333: Would be useful to print an RCM_ERROR diagnostic if
rcm_unregister_interest() fails.
* 348: s/Need to unregister/Unregister/
* 392: Assuming this errno assignment is actually needed, it
should be done after the call to mutex_unlock() to ensure
mutex_unlock() doesn't change it.
* 403, 525, 572, 1404: Given the descriptive name of the function,
the is_only_port variable seems needless -- just inline the calls
into the `if' statements.
* 425, 478: These messages seem unnecessary.
* 461, 635, 1322, 1370, 1429, 1548: Remove extra blank line.
* 491: aggr_consumer_online() failures deserve at least an
RCM_WARNING, not RCM_DEBUG.
* 494, 497: Should this be "IP/VLAN" consumers? (Or perhaps we
should just refrain from guessing what the consumers might be.)
* 508: Based on the callers, seems like aggr_offline_port() would
benefit from taking a CACHE_* state flag to set on node->vc_state.
* 514, 558: Would be slightly simpler to declare a normal variable
rather than an array with one member -- e.g.:
port.lp_linkid = node->vc_linkid;
status = dladm_aggr_remove(aggr->da_aggrid, 1, &port,
DLADM_OPT_ACTIVE);
* 564: Indentation would be cleaner with:
if (!(node->vc_state & CACHE_AGGR_PORT_OFFLINED))
return (RCM_SUCCESS);
* 577: What's the rationale behind setting *up to B_TRUE even
if dladm_aggr_up() fails?
* 588, 1536: s/Prompt the warning/Print a warning/
* 626, 723: Explicitly test against NULL.
* 634: Why not directly assign to *usagep?
* 642: Why is *errorp set to NULL?
* 655: Seems like we can drop cache_lock after the call to
aggr_usage().
* 685: Stray space between `**' and `depend_info'.
* 610: Under what situation do we get here without
CAHCE_AGGR_CONSUMER_OFFLINED being set? As with VLANs, it
seems odd to remove a link that still has consumers actively
using it.
* 736: Don't we need to hold cache_lock here to safely access
vc_aggr? (If not, why is it needed at line 1401?)
* 737: assert() seems needless; we'll crash two lines later.
* 738: What does the name `exported' mean here?
* 741: The ordering here seems strange -- shouldn't we defer the
removal of the node until after the consumers have agreed on
the remove?
* 749: This call to node_free() is done without holding cache_lock.
* 778, 787, 797, 806, 816: Given that aggr_log_err() calls
rcm_log_message(), why do we need to explicitly call
rcm_log_message()? If it's just to provide the linkid, seems
like that could be addressed by changing aggr_log_err() to take
a linkid rather than a `node'.
* 795: Indenting would be cleaner here if you did:
if (strcmp(nvpair_name(nvp), RCM_NV_LINKID) != 0)
continue;
* 852: The plural "AGGRs" seems misleading, since a port can only
be in one aggregation. Maybe a message like "ce0 is part of
aggregation aggr0" would be clearer? (Or, if we'll only output
this message when ce0 is the last port in the aggregation, we
could be more explicit.)
* 862: No separators are used here.
* 863: Why is LINKID_STR_WIDTH used here? We output a link name,
so it seems like it should be DLPI_LINKNAME_MAX.
* 870: bzero() here seems needless.
* 871: Prefer snprintf().
* 889: Prefer strlcat().
* 916: cache_lookup() is never called with hd == NULL.
* 925: How can `vc_resource' be NULL here?
* 925-932, 1168-1172, 1175-1179: These would be more natural
as `for' loops.
* 943: assert() seems needless; we'll crash on the next line.
* 976: aggr_port_update() needs a comment (or better yet an
assert()) that cache_lock is held.
* 1005, 1122, 1215: Remove needless parens.
* 1021: Remove needless comment.
* 1049: Needless ARGSUSED.
* 1053, 1496: No need to cast arg (and in the latter case, the
assignment can be hoisted to line 1488).
* 1069: s/infomation/information -- also, perhaps it would be
clearer as "cannot get aggr information for"?
* 1100: If this calloc() fails and `exist' is true, we end up
leaving an aggr in aggr_head with a NULL da_ports which will
e.g. later cause problems in aggr_is_only_port(). Same with the
logic on lines 1115-1117. If `exist' is false, then we end up
freeing aggr even though it's currently inserted on the list via
the aggr_list_insert() on line 1092.
* 1121: Seems simpler to always clear AGGR_STALE.
* 1128: No need to check if aggr != NULL.
* 1130, 1551: The notion that dladm_aggr_info() partially returns
dynamically-allocated memory is ugly (though this is not related
to you changes ;-).
* 1145-1147: `aggr_update_arg_t arg = { hd, 0 };' would be simpler.
* 1188: Seems like it would be simpler to introduce a `prnext'
variable and assign `prnext = probe->vc_next' here, then remove
lines 1195, 1202, and 1217. It would also allow `freeit' to
be eliminated.
* 1224: s/clear/delete/
* 1244: As per previous comments, seems like aggr_log_err() could
take a linkid rather than a node (and print a link name). Same
comments also apply on the format of the messages.
* 1260: Remove needless cast. Also, it's unclear why calloc() is
being used.
* 1278-1281: Would be simpler as:
return (aggr->da_nports == 1 && aggr->da_ports->dpi_portid == portid);
* 1293, 1338: `aggr' variable seems of limited use.
* 1294, 1339: No need to initialize `rsrc'.
* 1301, 1352: Would simplify code to use alloca() instead.
* 1314-1323, 1362-1371: No real need for the `goto done' here,
especially if you change the code to use alloca().
* 1378: s/Notify the RCM_RESOURCE_LINK_NEW event/
Send RCM_RESOURCE_LINK_NEW events/
* 1443: Why `cached_name' rather than `rsrc'?
* 1501: I'm a bit confused on the semantics of `retval' -- seems
like it provides the status of the last aggregation to be
examined, which doesn't seem like what we want.
* 1553: If the configuration was successful and a port can only
be in one aggregation, why do we return DLADM_WALK_CONTINUE?
ip_rcm.c:
* Since this is IP configuration, it seems strange (if not simply
wrong) to have link identifiers here. Is there a specific
reason why link/interface names can no longer be used?
* 822: Remove extra blank line.
* 1470: `ifname' seems more appropriate than `link' here.
* 1937-1940: When would `link' be NULL here?
--
meem