I’ve deployed Cfengine 3.0.5p1 across 800 hosts. I only have an issue with the
Cfengine daemons on 1 box where it appears I am hitting a bug. On this
machine, it spins a single core to 100% user space CPU utilization. Here are
the details.
$ /var/cfengine/bin/cf-agent -v
....
...
f3 ------------------------------------------------------------------------
cf3 # Extended system discovery is only available in version Nova and above
cf3 Additional hard class defined as: 32_bit
cf3 Additional hard class defined as: sunos_5_10
cf3 Additional hard class defined as: sunos_i86pc
cf3 Additional hard class defined as: sunos_i86pc_5_10
cf3 Additional hard class defined as: i386
cf3 Additional hard class defined as: i86pc
cf3 GNU autoconf class from compile time: compiled_on_solaris2_10
cf3 Address given by nameserver: 172.17.134.80
cf3 Interface 1: lo0
cf3 Interface 2: e1000g0
cf3 Adding alias loghost..
cf3 !! Cannot discover hardware IP, using DNS value
^C
So at the “cannot discover hardware IP” point, it hangs and spins the CPU to
100%. Looking at prstat –Lm output below:
$ prstat -Lm
PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID
16398 root 100 0.0 0.0 0.0 0.0 0.0 0.0 0.3 0 190 0 0 cf-agent/1
Putting cf-agent into super debug mode, I see this....
Broken host:
$ /var/cfengine/bin/cf-agent –ddd
....
....
GetVariable(sys,ipv4_1[172_17_134_80]) type=(to be determined)
IsExpandable(ipv4_1[172_17_134_80]) - syntax verify
Found 0 variables in (ipv4_1[172_17_134_80])
Looking for sys.ipv4_1[172_17_134_80]
Searching for scope context sys
Found scope reference sys
GetVariable(sys,ipv4_1[172_17_134_80]): using scope 'sys' for variable
'ipv4_1[172_17_134_80]'
At which point, cf-agent hangs. Comparing this to a working host, this is what
I see.
Working host:
GetVariable(sys,ipv4_1[172_17_134_81]) type=(to be determined)
IsExpandable(ipv4_1[172_17_134_81]) - syntax verify
Found 0 variables in (ipv4_1[172_17_134_81])
Looking for sys.ipv4_1[172_17_134_81]
Searching for scope context sys
Found scope reference sys
GetVariable(sys,ipv4_1[172_17_134_81]): using scope 'sys' for variable
'ipv4_1[172_17_134_81]'
No such variable found sys.ipv4_1[172_17_134_81]
AddVariableHash(sys.ipv4_1[172_17_134_81]=172 (string) rtype=s)
Searching for scope context sys
Found scope reference sys
CopyRvalItem(s)
ScanScalar([172])
DeleteRvalItem(l)
DeleteRval NULL
DeleteRvalItem(l)
DeleteRval NULL
Added Variable ipv4_1[172_17_134_81] at hash address 60 in scope sys with value
(omitted)
Trying to locate my IPv6 address
Unappending Trying to locate my IPv6 address
Unix_cf_popen(/sbin/ifconfig -a)
Unix_cf_pclose(pp)
cf_pwait - Waiting for process 12411
Looking for environment from cf-monitor...
Unappending Looking for environment from cf-monitor...
Searching for scope context mon
Found scope reference mon
No variable matched
NewScalar(mon,env_time,Sat Nov 20 00:28:23 2010)
So the broken host never gets to the “No such variable found
sys.ipv4_1[172_17_134_80]” statement.
So, I know this is a problem with Cfengine parsing the network interfaces. The
only thing, is I can not see a difference at all between the working and
non-working machines.
Broken machine’s ifconfig output:
$ ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232
index 1
inet 127.0.0.1 netmask ff000000
e1000g0: flags=1040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4> mtu 1500
index 2
inet 172.17.134.80 netmask ffffff00 broadcast 172.17.134.255
groupname primary
ether 0:14:4f:9e:cf:fe
e1000g0:1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
e1000g1:
flags=69000842<BROADCAST,RUNNING,MULTICAST,IPv4,NOFAILOVER,STANDBY,INACTIVE>
mtu 0 index 3
inet 0.0.0.0 netmask 0
groupname primary
ether 0:14:4f:9e:cf:ff
Working machine’s ifconfig output
$ ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232
index 1
inet 127.0.0.1 netmask ff000000
e1000g0: flags=1040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4> mtu 1500
index 2
inet 172.17.134.81 netmask ffffff00 broadcast 172.17.134.255
groupname primary
ether 0:14:4f:83:31:ac
e1000g0:1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
e1000g1:
flags=69000842<BROADCAST,RUNNING,MULTICAST,IPv4,NOFAILOVER,STANDBY,INACTIVE>
mtu 0 index 3
inet 0.0.0.0 netmask 0
groupname primary
ether 0:14:4f:83:31:ad
So other than the inet address of e1000g0 and the ethernet addresses, the
output is exactly the same. If I unplumb the interfaces e1000g0:1 and e1000g1
on the broken machine, the Cfengine daemons operate again.
Has anyone run into this bug before, or can help suggest anything?
Thanks!
Mike
_______________________________________________
Help-cfengine mailing list
[email protected]
https://cfengine.org/mailman/listinfo/help-cfengine