Peter Memishian wrote:
>  > >  > > General comments:
>  > >  > 
>  > >  > >           Also, I'm a bit concerned about the use of link
>  > >  > >           identifiers here, since e.g. cfgadm will output them 
> (right edge
>  > >  > >           truncated to avoid 80-column wrap):
>  > >  > > 
>  > >  > >           cfgadm: Component system is busy, try again:
>  > >  > >                          Resource                               Info
>  > >  > >           --------------------------------------  ------------------
>  > >  > >           /devices/pci at 9,700000/pci at 2/network at 0   Network 
> interface 
>  > >  > >           SUNW_network/1                          ce0 AGGRs: aggr0
>  > >  > >           SUNW_network/2                          aggr0 hosts IP add
>  > >  > >           SUNW_ip/10.8.57.204                     Plumbed IP Address
>  > >  > >            
>  > >  > >           What would be the impact of using link names instead?
>  > >  > > 
>  > >  >
>  > >  > Note that all datalink configuration are based on linkids, so a
>  > >  > rename-link should not affect the datalink datalink configuration
>  > >  > offline/online/configure. But if we use link names, there would be
>  > >  > races between the RCM operation and a "dladm rename-link" operation,
>  > >  > and would cause unexpected result.
>  > > 
>  > > What specific race are you concerned about?  Given that the network, 
> VLAN,
>  > > and aggr RCM modules rebuild the resource name cache before processing a
>  > > notification, wouldn't renames be transparently handled?  Are you worried
>  > > about a rename-link running in parallel with a cfgadm operation?  That
>  > > seems like a recipe for confusion no matter what, and also pretty 
> unlikely
>  > > given that it would require the datalinks to be unplumbed.
>  > > 
>  > True, I agree a "dladm rename-link" operation in parallel with a DR
>  > operation is rare, but it could happen. Note that it does not matter 
> whether
>  > the link is plumbed or not. For example, assuming an aggregation aggr1 
> which
>  >   is created over bge0, bge0 can be renamed when we tries to offline/online
>  > bge0, even when aggr1 is not plumbed.
> 
> Do you mean "even when aggr1 is plumbed"?
> 
Ah, right.

>  > If we don't care about the above cases at all, I guess we could change the
>  > resource name from the linkid to the linkname. I suspect the code would be
>  > much more messy though, as we often need to do the <linkname, linkid>
>  > translation.
>  > 
>  > Regarding the RCM_RESOURCE_LINK_NEW event (mostly used in the configuration
>  > process after the hardware is DRed in), it currently also carries linkids.
>  > Because this configuration processing is mostly asynchronous, the chance of
>  > the race would be greater than than the offline/online processing.
> 
> Understood.  I'm OK with leaving this as-is given that the resource names
> are already pretty cryptic (e.g. /devices/pci at 9,700000/pci at 2/network at 
> 0)
> and that the Info column from cfgadm provides human-consumable
> information, but we may need to revisit this in the future based on
> operational experience from the field.
> 
Okay.

>  > > For debug information, could we output the resource name instead?  (I
>  > > realize that this may contain the linkid, depending on the outcome of
>  > > the discussion above).
>  > > 
>  > I've tried to outcome the resource name as much as possible. But some of 
> the
>  > debug information cannot be changed. For example, in aggr_offline_port():
>  > 
>  >    rcm_log_message(RCM_TRACE2, "AGGR: delete aggregation %u\n",
>  >        aggr->da_aggrid);
> 
> I presume we don't have the information at this point to map it back to a
> linkname (has that info already been deleted)?
> 
The <link name, linkid> mapping is still stored in the dlmgmtd daemon, and we 
could get 
the link name if we really need it. But I am not convinced that we need it, as 
the message 
is really for diagnose purpose, calling libdladm APIs to convert linkid to 
linkname seems 
overkilled.

>  > >  > >         * Are we sure it's safe to call *_consumer_offline() while 
> holding
>  > >  > >           cache_lock?  Seems like it should be OK, but I'm slightly
>  > >  > >           nervous about calling other modules (via 
> rcm_request_offline())
>  > >  > >           while holding locks.
>  > >  > > 
>  > >  > Yes. It should be fine to hold locks. Note that it is not safe to 
> hold the 
>  > >  > lock when calling rcm_notify_event(RCM_RESOURCE_LINK_NEW), as it 
> could 
>  > >  > result another RCM_RESOURCE_LINK_NEW event.
>  > > 
>  > > Seems like that would only be problematic if the first
>  > > RCM_RESOURCE_LINK_NEW led to synchronous processing of the second one.
>  > > Could you elaborate on the issue?
>  > > 
>  > I don't think we ever call rcm_notify_event(RCM_RESOURCE_LINK_NEW) with the
>  > locks held. Did you see any code does that?
> 
> No, but I'm questioning your original comment that it would be unsafe to
> hold the lock when calling rcm_notify_event(RCM_RESOURCE_LINK_NEW).  Seems
> like this wouldn't be a problem even if we did hold the lock because the
> handling of the notification is performed asynchronously.
> 
The notification could be synchronous if it is called within the rcm_daemon. 
See 
rcm_direct_call().

>  > >  > > aggr_rcm.c:
>  > >  >
>  > >  > >         * 221-227: As with vlan_rcm.c:197, it's unclear how this 
> would
>  > >  > >           ever have any work to do -- and if it did have work to 
> do, it
>  > >  > >           seems like it would need to call 
> rcm_unregister_interest().
>  > >  > > 
>  > >  > The aggr_unregister() function does not seem to be called at all. So 
> that I 
>  > >  > am not sure whether 221-227 could have work to do. But at this point, 
> since 
>  > >  > the framework does not provide the rcm_handle, it is not possible to 
> call 
>  > >  > rcm_unregister_interest() here.
>  > > 
>  > > As above, please add a comment (and file the bug).
> 
> So was the bug filed?
> 
6642490.

Thanks
- Cathy

Reply via email to