In the past week, I've had two separate occurrences of, what I believe to be, 
two pretty serious issues.  Another admin cleared the issue the first time they 
happened before capturing any debugging (other than emails from Cfengine and 
cron) but this second time I've kept the machine in the broken state for 
debugging purposes.

In both cases the central policy host is running 3.3.2 and the remote node is 
running 3.2.0.  Both are self-compiled to allow for a custom WORKDIR.

First, a policy file was only partially transferred and the partial copy 
overwrote the existing file.  Obviously, this causes validation of my policy to 
then fail, which is the root of the second issue.  I'll get to that in just a 
moment.  Cfengine is usually pretty good about   This is what was captured in 
the outputs directory by cf-execd:

# cat ../outputs/previous 
Failed send
 !!! System reports error for recv: "Resource temporarily unavailable"
I: Made in version 'not specified' of '/var/cache/cfengine/inputs/promises.cf' 
near line 118
I: Comment: Update local policy cache from master policy server
 !! Transmission refused or failed statting 
/etc/cfengine/masterfiles/global/any/var/cache/cfengine/inputs/eod.cf
Got: CFD_FALSE
 !! Transmission refused or failed statting 
/etc/cfengine/masterfiles/global/any/var/cache/cfengine/inputs/dns.cf
Got:
 !! Transmission refused or failed statting 
/etc/cfengine/masterfiles/global/any/var/cache/cfengine/inputs/cobbler.cf
Got:
 !! Transmission refused or failed statting 
/etc/cfengine/masterfiles/global/any/var/cache/cfengine/inputs/ldap.cf
Got:
 !! Transmission refused or failed statting 
/etc/cfengine/masterfiles/global/any/var/cache/cfengine/inputs/cfengine_stdlib.cf
Got:
 !! Transmission refused or failed statting 
/etc/cfengine/masterfiles/global/any/var/cache/cfengine/inputs/cfupgrade.cf
Got:
 !! Transmission refused or failed statting 
/etc/cfengine/masterfiles/global/any/var/cache/cfengine/modules
Got:
#

The actual file that was corrupted is not listed above but is found by 
cf-promises.

# cf-promises 
cf3> /var/cache/cfengine/inputs/globals.cf:1,2: Something defined outside of a 
block or missing punctuation in input, near token 't'
cf3> /var/cache/cfengine/inputs/globals.cf:1,2: syntax error, near token 't'
#

My cached globals.cf got truncated about 1/3 of way through the file.  There is 
plenty of disk space, the cache is on local disk, no indications of hardware 
failure, and no other applications appear to be affected.

Normally, such a policy validation failure would be fixed by cf-agent falling 
back to the failsafe.cf (assuming that file is also not corrupt).  But for 
whatever reason, cf-agent did not execute the failsafe.  This is my second 
serious issue.  Cf-agent does not seqfault or otherwise crash, it reports no 
errors other than the invalid inputs.  Normally on errors like this, one would 
see some lines like this:

##
Fatal cfengine error: Too many errors
cf-agent was not able to get confirmation of promises from cf-promises, so 
going to failsafe
##

but I only got the first line and then a normal exit.

Debug mode didn't really offer me much more to go on either.


##
Fatal cfengine error: Too many errors

GetVariable(control_agent,track_value) type=(to be determined)
IsExpandable(track_value) - syntax verify
Found 0 variables in (track_value)
Looking for control_agent.track_value
Searching for scope context control_agent
Found scope reference control_agent
GetVariable(control_agent,track_value): using scope 'control_agent' for 
variable 'track_value'
No such variable found control_agent.track_value


GetVariable(control_common,version) type=(to be determined)
IsExpandable(version) - syntax verify
Found 0 variables in (version)
Looking for control_common.version
Searching for scope context control_common
Found scope reference control_common
GetVariable(control_common,version): using scope 'control_common' for variable 
'version'
No such variable found control_common.version

Outcome of version (not specified): No checks were scheduled
GenericDeInitialize()
CloseAllDB()
Closed 0 open DB handles
##

At this point, whatever troubleshooting I've done has managed to twiddle enough 
bits that the failsafe is kicking in now, so I'm unable to capture any more 
debug data. 
-- 
/* Wes Hardin */
UNIX/Linux Systems Administrator, IT Engineering Support
Maxim Integrated Products | Innovation Delivered® | www.maxim-ic.com

_______________________________________________
Help-cfengine mailing list
Help-cfengine@cfengine.org
https://cfengine.org/mailman/listinfo/help-cfengine

Reply via email to