> > > > General comments: > > > > > > > Also, I'm a bit concerned about the use of link > > > > identifiers here, since e.g. cfgadm will output them (right > > edge > > > > truncated to avoid 80-column wrap): > > > > > > > > cfgadm: Component system is busy, try again: > > > > Resource Info > > > > -------------------------------------- ------------------ > > > > /devices/pci at 9,700000/pci at 2/network at 0 Network > > interface > > > > SUNW_network/1 ce0 AGGRs: aggr0 > > > > SUNW_network/2 aggr0 hosts IP add > > > > SUNW_ip/10.8.57.204 Plumbed IP Address > > > > > > > > What would be the impact of using link names instead? > > > > > > > > > > Note that all datalink configuration are based on linkids, so a > > > rename-link should not affect the datalink datalink configuration > > > offline/online/configure. But if we use link names, there would be > > > races between the RCM operation and a "dladm rename-link" operation, > > > and would cause unexpected result. > > > > What specific race are you concerned about? Given that the network, VLAN, > > and aggr RCM modules rebuild the resource name cache before processing a > > notification, wouldn't renames be transparently handled? Are you worried > > about a rename-link running in parallel with a cfgadm operation? That > > seems like a recipe for confusion no matter what, and also pretty unlikely > > given that it would require the datalinks to be unplumbed. > > > True, I agree a "dladm rename-link" operation in parallel with a DR > operation is rare, but it could happen. Note that it does not matter whether > the link is plumbed or not. For example, assuming an aggregation aggr1 which > is created over bge0, bge0 can be renamed when we tries to offline/online > bge0, even when aggr1 is not plumbed.
Do you mean "even when aggr1 is plumbed"? > If we don't care about the above cases at all, I guess we could change the > resource name from the linkid to the linkname. I suspect the code would be > much more messy though, as we often need to do the <linkname, linkid> > translation. > > Regarding the RCM_RESOURCE_LINK_NEW event (mostly used in the configuration > process after the hardware is DRed in), it currently also carries linkids. > Because this configuration processing is mostly asynchronous, the chance of > the race would be greater than than the offline/online processing. Understood. I'm OK with leaving this as-is given that the resource names are already pretty cryptic (e.g. /devices/pci at 9,700000/pci at 2/network at 0) and that the Info column from cfgadm provides human-consumable information, but we may need to revisit this in the future based on operational experience from the field. > > For debug information, could we output the resource name instead? (I > > realize that this may contain the linkid, depending on the outcome of > > the discussion above). > > > I've tried to outcome the resource name as much as possible. But some of the > debug information cannot be changed. For example, in aggr_offline_port(): > > rcm_log_message(RCM_TRACE2, "AGGR: delete aggregation %u\n", > aggr->da_aggrid); I presume we don't have the information at this point to map it back to a linkname (has that info already been deleted)? > > > > * Are we sure it's safe to call *_consumer_offline() while > > holding > > > > cache_lock? Seems like it should be OK, but I'm slightly > > > > nervous about calling other modules (via > > rcm_request_offline()) > > > > while holding locks. > > > > > > > Yes. It should be fine to hold locks. Note that it is not safe to hold > > the > > > lock when calling rcm_notify_event(RCM_RESOURCE_LINK_NEW), as it could > > > result another RCM_RESOURCE_LINK_NEW event. > > > > Seems like that would only be problematic if the first > > RCM_RESOURCE_LINK_NEW led to synchronous processing of the second one. > > Could you elaborate on the issue? > > > I don't think we ever call rcm_notify_event(RCM_RESOURCE_LINK_NEW) with the > locks held. Did you see any code does that? No, but I'm questioning your original comment that it would be unsafe to hold the lock when calling rcm_notify_event(RCM_RESOURCE_LINK_NEW). Seems like this wouldn't be a problem even if we did hold the lock because the handling of the notification is performed asynchronously. > > > > aggr_rcm.c: > > > > > > > * 221-227: As with vlan_rcm.c:197, it's unclear how this would > > > > ever have any work to do -- and if it did have work to do, > > it > > > > seems like it would need to call rcm_unregister_interest(). > > > > > > > The aggr_unregister() function does not seem to be called at all. So > > that I > > > am not sure whether 221-227 could have work to do. But at this point, > > since > > > the framework does not provide the rcm_handle, it is not possible to > > call > > > rcm_unregister_interest() here. > > > > As above, please add a comment (and file the bug). So was the bug filed? -- meem
