In the past week, I've had two separate occurrences of, what I believe to be, two pretty serious issues. Another admin cleared the issue the first time they happened before capturing any debugging (other than emails from Cfengine and cron) but this second time I've kept the machine in the broken state for debugging purposes.
In both cases the central policy host is running 3.3.2 and the remote node is running 3.2.0. Both are self-compiled to allow for a custom WORKDIR. First, a policy file was only partially transferred and the partial copy overwrote the existing file. Obviously, this causes validation of my policy to then fail, which is the root of the second issue. I'll get to that in just a moment. Cfengine is usually pretty good about This is what was captured in the outputs directory by cf-execd: # cat ../outputs/previous Failed send !!! System reports error for recv: "Resource temporarily unavailable" I: Made in version 'not specified' of '/var/cache/cfengine/inputs/promises.cf' near line 118 I: Comment: Update local policy cache from master policy server !! Transmission refused or failed statting /etc/cfengine/masterfiles/global/any/var/cache/cfengine/inputs/eod.cf Got: CFD_FALSE !! Transmission refused or failed statting /etc/cfengine/masterfiles/global/any/var/cache/cfengine/inputs/dns.cf Got: !! Transmission refused or failed statting /etc/cfengine/masterfiles/global/any/var/cache/cfengine/inputs/cobbler.cf Got: !! Transmission refused or failed statting /etc/cfengine/masterfiles/global/any/var/cache/cfengine/inputs/ldap.cf Got: !! Transmission refused or failed statting /etc/cfengine/masterfiles/global/any/var/cache/cfengine/inputs/cfengine_stdlib.cf Got: !! Transmission refused or failed statting /etc/cfengine/masterfiles/global/any/var/cache/cfengine/inputs/cfupgrade.cf Got: !! Transmission refused or failed statting /etc/cfengine/masterfiles/global/any/var/cache/cfengine/modules Got: # The actual file that was corrupted is not listed above but is found by cf-promises. # cf-promises cf3> /var/cache/cfengine/inputs/globals.cf:1,2: Something defined outside of a block or missing punctuation in input, near token 't' cf3> /var/cache/cfengine/inputs/globals.cf:1,2: syntax error, near token 't' # My cached globals.cf got truncated about 1/3 of way through the file. There is plenty of disk space, the cache is on local disk, no indications of hardware failure, and no other applications appear to be affected. Normally, such a policy validation failure would be fixed by cf-agent falling back to the failsafe.cf (assuming that file is also not corrupt). But for whatever reason, cf-agent did not execute the failsafe. This is my second serious issue. Cf-agent does not seqfault or otherwise crash, it reports no errors other than the invalid inputs. Normally on errors like this, one would see some lines like this: ## Fatal cfengine error: Too many errors cf-agent was not able to get confirmation of promises from cf-promises, so going to failsafe ## but I only got the first line and then a normal exit. Debug mode didn't really offer me much more to go on either. ## Fatal cfengine error: Too many errors GetVariable(control_agent,track_value) type=(to be determined) IsExpandable(track_value) - syntax verify Found 0 variables in (track_value) Looking for control_agent.track_value Searching for scope context control_agent Found scope reference control_agent GetVariable(control_agent,track_value): using scope 'control_agent' for variable 'track_value' No such variable found control_agent.track_value GetVariable(control_common,version) type=(to be determined) IsExpandable(version) - syntax verify Found 0 variables in (version) Looking for control_common.version Searching for scope context control_common Found scope reference control_common GetVariable(control_common,version): using scope 'control_common' for variable 'version' No such variable found control_common.version Outcome of version (not specified): No checks were scheduled GenericDeInitialize() CloseAllDB() Closed 0 open DB handles ## At this point, whatever troubleshooting I've done has managed to twiddle enough bits that the failsafe is kicking in now, so I'm unable to capture any more debug data. -- /* Wes Hardin */ UNIX/Linux Systems Administrator, IT Engineering Support Maxim Integrated Products | Innovation Delivered® | www.maxim-ic.com _______________________________________________ Help-cfengine mailing list Help-cfengine@cfengine.org https://cfengine.org/mailman/listinfo/help-cfengine