> >  > > General comments:
 > >  > 
 > >  > >           Also, I'm a bit concerned about the use of link
 > >  > >           identifiers here, since e.g. cfgadm will output them (right 
 > > edge
 > >  > >           truncated to avoid 80-column wrap):
 > >  > > 
 > >  > >           cfgadm: Component system is busy, try again:
 > >  > >                          Resource                               Info
 > >  > >           --------------------------------------  ------------------
 > >  > >           /devices/pci at 9,700000/pci at 2/network at 0   Network 
 > > interface 
 > >  > >           SUNW_network/1                          ce0 AGGRs: aggr0
 > >  > >           SUNW_network/2                          aggr0 hosts IP add
 > >  > >           SUNW_ip/10.8.57.204                     Plumbed IP Address
 > >  > >            
 > >  > >           What would be the impact of using link names instead?
 > >  > > 
 > >  >
 > >  > Note that all datalink configuration are based on linkids, so a
 > >  > rename-link should not affect the datalink datalink configuration
 > >  > offline/online/configure. But if we use link names, there would be
 > >  > races between the RCM operation and a "dladm rename-link" operation,
 > >  > and would cause unexpected result.
 > > 
 > > What specific race are you concerned about?  Given that the network, VLAN,
 > > and aggr RCM modules rebuild the resource name cache before processing a
 > > notification, wouldn't renames be transparently handled?  Are you worried
 > > about a rename-link running in parallel with a cfgadm operation?  That
 > > seems like a recipe for confusion no matter what, and also pretty unlikely
 > > given that it would require the datalinks to be unplumbed.
 > > 
 > True, I agree a "dladm rename-link" operation in parallel with a DR
 > operation is rare, but it could happen. Note that it does not matter whether
 > the link is plumbed or not. For example, assuming an aggregation aggr1 which
 >   is created over bge0, bge0 can be renamed when we tries to offline/online
 > bge0, even when aggr1 is not plumbed.

Do you mean "even when aggr1 is plumbed"?

 > If we don't care about the above cases at all, I guess we could change the
 > resource name from the linkid to the linkname. I suspect the code would be
 > much more messy though, as we often need to do the <linkname, linkid>
 > translation.
 > 
 > Regarding the RCM_RESOURCE_LINK_NEW event (mostly used in the configuration
 > process after the hardware is DRed in), it currently also carries linkids.
 > Because this configuration processing is mostly asynchronous, the chance of
 > the race would be greater than than the offline/online processing.

Understood.  I'm OK with leaving this as-is given that the resource names
are already pretty cryptic (e.g. /devices/pci at 9,700000/pci at 2/network at 0)
and that the Info column from cfgadm provides human-consumable
information, but we may need to revisit this in the future based on
operational experience from the field.

 > > For debug information, could we output the resource name instead?  (I
 > > realize that this may contain the linkid, depending on the outcome of
 > > the discussion above).
 > > 
 > I've tried to outcome the resource name as much as possible. But some of the
 > debug information cannot be changed. For example, in aggr_offline_port():
 > 
 >      rcm_log_message(RCM_TRACE2, "AGGR: delete aggregation %u\n",
 >          aggr->da_aggrid);

I presume we don't have the information at this point to map it back to a
linkname (has that info already been deleted)?

 > >  > >         * Are we sure it's safe to call *_consumer_offline() while 
 > > holding
 > >  > >           cache_lock?  Seems like it should be OK, but I'm slightly
 > >  > >           nervous about calling other modules (via 
 > > rcm_request_offline())
 > >  > >           while holding locks.
 > >  > > 
 > >  > Yes. It should be fine to hold locks. Note that it is not safe to hold 
 > > the 
 > >  > lock when calling rcm_notify_event(RCM_RESOURCE_LINK_NEW), as it could 
 > >  > result another RCM_RESOURCE_LINK_NEW event.
 > > 
 > > Seems like that would only be problematic if the first
 > > RCM_RESOURCE_LINK_NEW led to synchronous processing of the second one.
 > > Could you elaborate on the issue?
 > > 
 > I don't think we ever call rcm_notify_event(RCM_RESOURCE_LINK_NEW) with the
 > locks held. Did you see any code does that?

No, but I'm questioning your original comment that it would be unsafe to
hold the lock when calling rcm_notify_event(RCM_RESOURCE_LINK_NEW).  Seems
like this wouldn't be a problem even if we did hold the lock because the
handling of the notification is performed asynchronously.

 > >  > > aggr_rcm.c:
 > >  >
 > >  > >         * 221-227: As with vlan_rcm.c:197, it's unclear how this would
 > >  > >           ever have any work to do -- and if it did have work to do, 
 > > it
 > >  > >           seems like it would need to call rcm_unregister_interest().
 > >  > > 
 > >  > The aggr_unregister() function does not seem to be called at all. So 
 > > that I 
 > >  > am not sure whether 221-227 could have work to do. But at this point, 
 > > since 
 > >  > the framework does not provide the rcm_handle, it is not possible to 
 > > call 
 > >  > rcm_unregister_interest() here.
 > > 
 > > As above, please add a comment (and file the bug).

So was the bug filed?

-- 
meem

Reply via email to