Hi, I'm running CF-Engine 3.0.2, community edition on a grid of around 600 systems. We use this for distributing /etc/passwd, et al. and have encountered a couple of issues. The most persistent problem we have encountered is that the cf-serverd process will segfault at random intervals but quite often (several times per day). Load does not appear to be a direct factor, since I can force an update and the service will usually "survive".
I did try recompiling with "CFLAGS=-D_FORTIFY_SOURCE=0" which has improved things but not enough. Currently I'm running a cron job to check to see if the service is running as a hold-over until I can check a new version. Does this tally with anyone else's experience? I can send the configuration if that is of use. The second issue occurred yesterday on a client system that was under heavy load. The cf-agent service failed with the following error: May 19 05:30:59 n123 cf-agent: Can't stat new file /etc/passwd.cfnew May 19 05:30:59 n123 cf-agent: !!! System error for stat: "No such file or directory" May 19 05:30:59 n123 cf-agent: Unknown user root May 19 05:30:59 n123 last message repeated 15 times May 19 05:31:25 n123 sshd[6098]: fatal: Privilege separation user sshd does not exist In itself, the fact that cf-agent can't find the newly downloaded file is a bit worrying but, by itself, survivable. However, it appears that the password file was itself deleted during this run as we were unable to login to the system and some services on the host were impacted. Now, quite aside from the merits or otherwise of using cf-engine to distribute identity information in this way, it seems that there might be cases where a file can be removed even though the updated (.cfnew) file is not available (I distribute other non-identity related updates using a similar mechanism). I have not been able to reproduce this case at all, and the system healed itself once the system returned to an idle state. I am already looking into alternatives to distributing /etc/passwd, shadow and company throughout the grid while still using file based authentication. But I think there is a more general problem here with the copy_from method I use which is why I've written to the list. Any guidance greatly appreciated. Regards, Malcolm. _______________________________________________ Help-cfengine mailing list Help-cfengine@cfengine.org https://cfengine.org/mailman/listinfo/help-cfengine