SMF Experts,
First, apologies for the length of this email. Hopefully you'll find it
straightforward.
As part of final testing for Clearview UV, we've been chasing an upgrade
issue which we now believe to be an issue with SMF dependency handling.
Specifically, we upgraded via packaging from a current Nevada system to a
system running UV. After the upgrade completed, we checked the
repository[1] and verified that the new network/datalink-management
service had been imported[2] *prior* to rebooting the upgraded system:
# svccfg
svc:> repository /a/etc/svc/repository.db
svc:> select network/datalink-management
svc:/network/datalink-management> listprop
general framework
general/entity_stability astring Unstable
general/single_instance boolean true
dependents framework
dependents/device-system fmri svc:/system/device/local
dependents/install-discovery fmri svc:/system/install-discovery
-> dependents/network-physical fmri svc:/network/physical
...
Note in particular that svc:/network/physical is listed as a dependent.
Likewise, looking at the network/physical service shows that it depends
on network/datalink-management:
svc:/network/datalink-management> select network/physical
svc:/network/physical> listprop
loopback dependency
loopback/entities fmri svc:/network/loopback
loopback/grouping astring require_all
loopback/restart_on astring none
loopback/type astring service
tnctl_network-physical dependency
tnctl_network-physical/entities fmri svc:/network/tnctl
tnctl_network-physical/external boolean true
tnctl_network-physical/grouping astring optional_all
tnctl_network-physical/restart_on astring none
tnctl_network-physical/type astring service
-> network-physical dependency
-> network-physical/entities fmri svc:/network/datalink-management
network-physical/external boolean true
-> network-physical/grouping astring require_all
network-physical/restart_on astring none
network-physical/type astring service
general framework
general/entity_stability astring Unstable
Also prior to rebooting, we added some debug messages to the net-physical
startup script to determine what svcs -d/-D thought the dependencies
between network/physical and network/datalink-management were. When we
rebooted, those debugging messages revealed that during net-physical
execution, the datalink-management service dependency was missing[3]:
svcs -d network/physical
STATE STIME FMRI
online 2:02:30 svc:/network/tnctl:default
online 2:02:30 svc:/network/tnctl:default
online 2:02:30 svc:/network/loopback:default
online 2:02:30 svc:/network/loopback:default
Likewise, debugging messages indicated that the datalink-management
service has no dependents:
svcs -D network/datalink-management
STATE STIME FMRI
... and "svcs -a" showed datalink-management in an unidentified state:
STATE STIME FMRI
...
uninitialized 2:02:38 svc:/network/talk:default
uninitialized 2:02:38 svc:/network/slp:default
uninitialized 2:02:38 svc:/network/telnet:defaul
-> - svc:/network/datalink-management:default
Once boot completed and one logged in, things appeared to have become
correct - though the STIME makes it clear that something is amiss:
# svcs -d network/physical
STATE STIME FMRI
online 2:02:30 svc:/network/tnctl:default
online 2:02:30 svc:/network/tnctl:default
online 2:02:30 svc:/network/loopback:default
online 2:02:30 svc:/network/loopback:default
online 2:02:41 svc:/network/datalink-management:default
online 2:02:41 svc:/network/datalink-management:default
# svcs -D network/datalink-management
STATE STIME FMRI
disabled 2:02:28 svc:/network/physical:nwam
online 2:03:05 svc:/system/device/local:default
online 2:04:48 svc:/network/physical:default
I'm not familiar with the internals of SMF, but it seems as if the
dependency information early in boot is being extracted from a snapshot --
perhaps until manifest-import? If so, this is certainly a problem for us,
as we need to ensure that network/datalink-management is online prior to
running a certain part of the net-physical script (hence the dependency).
One workaround that occurred to me was to keep the dependency, but also
explicitly do a "svcadm enable -ts network/datalink-management" from the
net-physical script to ensure it's online by the time we need it.
In any case, we are eager to hear your thoughts on all of this -- and
especially on the workaround.
Thanks!
[1] To be sure we were examining the right repository, we modified a
property in the network/datalink-management service description;
after rebooting the system, the changed value was still present.
[2] Recall that we always import this service (rather than using
manifest-import) since the service needs to run early in boot,
before manifest-import runs.
[3] I presume the duplicate dependency entries are harmless and are
a separate bug.
--
meem