Cfengine 3.0.5p1 daemons spinning CPU to 100% on 1 host out of 800

Mike Svoboda Fri, 19 Nov 2010 16:34:02 -0800

I’ve deployed Cfengine 3.0.5p1 across 800 hosts.  I only have an issue with the 
Cfengine daemons on 1 box where it appears I am hitting a bug.  On this 
machine, it spins a single core to 100% user space CPU utilization.  Here are 
the details.



$ /var/cfengine/bin/cf-agent -v
....
...
f3 ------------------------------------------------------------------------
cf3 # Extended system discovery is only available in version Nova and above
cf3 Additional hard class defined as: 32_bit
cf3 Additional hard class defined as: sunos_5_10
cf3 Additional hard class defined as: sunos_i86pc
cf3 Additional hard class defined as: sunos_i86pc_5_10
cf3 Additional hard class defined as: i386
cf3 Additional hard class defined as: i86pc
cf3 GNU autoconf class from compile time: compiled_on_solaris2_10
cf3 Address given by nameserver: 172.17.134.80
cf3 Interface 1: lo0
cf3 Interface 2: e1000g0
cf3 Adding alias loghost..
cf3  !! Cannot discover hardware IP, using DNS value
^C


So at the “cannot discover hardware IP” point, it hangs and spins the CPU to 
100%.  Looking at prstat –Lm output below:


$ prstat -Lm
   PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID
 16398 root     100 0.0 0.0 0.0 0.0 0.0 0.0 0.3   0 190   0   0 cf-agent/1


Putting cf-agent into super debug mode, I see this....

Broken host:
$ /var/cfengine/bin/cf-agent –ddd
....
....
GetVariable(sys,ipv4_1[172_17_134_80]) type=(to be determined)
IsExpandable(ipv4_1[172_17_134_80]) - syntax verify
Found 0 variables in (ipv4_1[172_17_134_80])
Looking for sys.ipv4_1[172_17_134_80]
Searching for scope context sys
Found scope reference sys
GetVariable(sys,ipv4_1[172_17_134_80]): using scope 'sys' for variable 
'ipv4_1[172_17_134_80]'



At which point, cf-agent hangs.  Comparing this to a working host, this is what 
I see.

Working host:
GetVariable(sys,ipv4_1[172_17_134_81]) type=(to be determined)
IsExpandable(ipv4_1[172_17_134_81]) - syntax verify
Found 0 variables in (ipv4_1[172_17_134_81])
Looking for sys.ipv4_1[172_17_134_81]
Searching for scope context sys
Found scope reference sys
GetVariable(sys,ipv4_1[172_17_134_81]): using scope 'sys' for variable 
'ipv4_1[172_17_134_81]'
No such variable found sys.ipv4_1[172_17_134_81]
AddVariableHash(sys.ipv4_1[172_17_134_81]=172 (string) rtype=s)
Searching for scope context sys
Found scope reference sys
CopyRvalItem(s)
ScanScalar([172])
DeleteRvalItem(l)
DeleteRval NULL
DeleteRvalItem(l)
DeleteRval NULL
Added Variable ipv4_1[172_17_134_81] at hash address 60 in scope sys with value 
(omitted)
Trying to locate my IPv6 address
Unappending Trying to locate my IPv6 address
Unix_cf_popen(/sbin/ifconfig -a)
Unix_cf_pclose(pp)
cf_pwait - Waiting for process 12411
Looking for environment from cf-monitor...
Unappending Looking for environment from cf-monitor...
Searching for scope context mon
Found scope reference mon
No variable matched
NewScalar(mon,env_time,Sat Nov 20 00:28:23 2010)


So the broken host never gets to the “No such variable found 
sys.ipv4_1[172_17_134_80]” statement.

So, I know this is a problem with Cfengine parsing the network interfaces.  The 
only thing, is I can not see a difference at all between the working and 
non-working machines.


Broken machine’s ifconfig output:
$ ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 
index 1
        inet 127.0.0.1 netmask ff000000
e1000g0: flags=1040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4> mtu 1500 
index 2
        inet 172.17.134.80 netmask ffffff00 broadcast 172.17.134.255
        groupname primary
        ether 0:14:4f:9e:cf:fe
e1000g0:1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
        inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
e1000g1: 
flags=69000842<BROADCAST,RUNNING,MULTICAST,IPv4,NOFAILOVER,STANDBY,INACTIVE> 
mtu 0 index 3
        inet 0.0.0.0 netmask 0
        groupname primary
        ether 0:14:4f:9e:cf:ff



Working machine’s ifconfig output
$ ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 
index 1
        inet 127.0.0.1 netmask ff000000
e1000g0: flags=1040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4> mtu 1500 
index 2
        inet 172.17.134.81 netmask ffffff00 broadcast 172.17.134.255
        groupname primary
        ether 0:14:4f:83:31:ac
e1000g0:1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
        inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
e1000g1: 
flags=69000842<BROADCAST,RUNNING,MULTICAST,IPv4,NOFAILOVER,STANDBY,INACTIVE> 
mtu 0 index 3
        inet 0.0.0.0 netmask 0
        groupname primary
        ether 0:14:4f:83:31:ad



So other than the inet address of e1000g0 and the ethernet addresses, the 
output is exactly the same.  If I unplumb the interfaces e1000g0:1 and e1000g1 
on the broken machine, the Cfengine daemons operate again.


Has anyone run into this bug before, or can help suggest anything?

Thanks!
Mike

_______________________________________________
Help-cfengine mailing list
[email protected]
https://cfengine.org/mailman/listinfo/help-cfengine

Cfengine 3.0.5p1 daemons spinning CPU to 100% on 1 host out of 800

Reply via email to