Forum: Cfengine Help Subject: Cfagent hangs on 'copy' Author: danci1973 Link to topic: https://cfengine.com/forum/read.php?3,18586,18586#msg-18586
Hi, I have a bit of a trouble with cfagent hanging when copying files from the server. We use cfengine for years now to manage close to a 1000 machines and it works pretty well. Machines are mostly rather old (our own Linux 2.4 based distro), but there are also newer, running some other Linux distro (mostly OpenSuSE) with 2.6. We use cfengine-2.1.11, which admittedly is a bit old, but so far it has worked flawlessly. We'd like to avoid upgrading all machines, if possible... Right now I'm introducing a couple of new OpenSuSE 11.2 based machines and I got some problems - I can run 'cfagent --DHr00 --no-splay --inform' and it starts fine. It copies some files and then it just suddenly hangs... If I run cfagent with '-d2', this is where it stops: ExpandVarstring(cfengine.lmq.mercator.si) ExpandVarstring(/var/lib/cfengine/lmq_bins/clean_old_lmq.sh) ExpandVarstring(/usr/local/sbin/clean_old_lmq.sh) Checking copy from cfengine.lmq.mercator.si:/var/lib/cfengine/lmq_bins/clean_old_lmq.sh to /usr/local/sbin/clean_old_lmq.sh ExpandVarstring(cfengine.lmq.mercator.si) Server connection to cfengine.lmq.mercator.si already open on 3 Authentic connection verified cf_rstat(/var/lib/cfengine/lmq_bins/clean_old_lmq.sh) GetCachedStatData(/var/lib/cfengine/lmq_bins/clean_old_lmq.sh) Did not find in cache Transaction Send Attempting to send 73 bytes SendSocketStream, sent 73 RecvSocketStream(8) (Concatenated 8 from stream) Transaction Receive [] RecvSocketStream(69) (Concatenated 69 from stream) Mode = 493,0 OK: type=0 mode=755 lmode=0 uid=0 gid=0 size=302 atime=1286007134 mtime=1249381041 ino=4097100 nlnk=1, dev=2304 RecvSocketStream(8) (Concatenated 8 from stream) Transaction Receive [] RecvSocketStream(3) (Concatenated 3 from stream) Linkbuffer: OK: Directory for /usr/local/sbin/clean_old_lmq.sh exists. Okay CheckImage (source=/var/lib/cfengine/lmq_bins/clean_old_lmq.sh destination=/usr/local/sbin/clean_old_lmq.sh) cf_rstat(/var/lib/cfengine/lmq_bins/clean_old_lmq.sh) GetCachedStatData(/var/lib/cfengine/lmq_bins/clean_old_lmq.sh) Found in cache ImageCopy(/var/lib/cfengine/lmq_bins/clean_old_lmq.sh,/usr/local/sbin/clean_old_lmq.sh,+700,-7077) ExpandVarstring(cfengine.lmq.mercator.si) IgnoredOrExcluded(/var/lib/cfengine/lmq_bins/clean_old_lmq.sh) file /usr/local/sbin/clean_old_lmq.sh class any.Hr00 was not excluded cfengine:vlmq-0001:: /usr/local/sbin/clean_old_lmq.sh wasn't at destination (copying) cfengine:vlmq-0001:: Copying from cfengine.lmq.mercator.si:/var/lib/cfengine/lmq_bins/clean_old_lmq.sh CopyReg(/var/lib/cfengine/lmq_bins/clean_old_lmq.sh,/usr/local/sbin/clean_old_lmq.sh) This is a remote copy from server: cfengine.lmq.mercator.si Transaction Send Attempting to send 60 bytes SendSocketStream, sent 60 RecvSocketStream(302) And it just hangs there indefinitely or until I hit CTRL-C in which case it says this: Ccfengine:vlmq-0001:: Received signal 2 (SIGINT) while doing cfengine:vlmq-0001:: Logical start time Sat Oct 2 10:49:32 2010 cfengine:vlmq-0001:: This sub-task started really at Sat Oct 2 10:49:32 2010 If I run the same command with 'strace', I get this: sendto(3, "t 65\0\0\0\0SYNCH 1286009580 STAT /v"..., 73, 0, NULL, 0) = 73 recvfrom(3, "t 69\0\0\0\0", 8, 0, NULL, NULL) = 8 recvfrom(3, "OK: 0 493 0 0 0 302 1286007134 1"..., 69, 0, NULL, NULL) = 69 recvfrom(3, "t 3\0\0\0\0\0", 8, 0, NULL, NULL) = 8 recvfrom(3, "OK:", 3, 0, NULL, NULL) = 3 lstat("/usr/local/sbin", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 stat("/usr", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 stat("/usr/local", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 stat("/usr/local/sbin", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 lstat("/usr/local/sbin/clean_old_lmq.sh", 0x7fff0c58d0e0) = -1 ENOENT (No such file or directory) fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1524a61000 write(1, "cfengine:vlmq-0001:: Copying fro"..., 103cfengine:vlmq-0001:: Copying from cfengine.lmq.mercator.si:/var/lib/cfengine/lmq_bins/clean_old_lmq.sh ) = 103 unlink("/usr/local/sbin/clean_old_lmq.sh.cfnew") = 0 open("/usr/local/sbin/clean_old_lmq.sh.cfnew", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 4 sendto(3, "t 52\0\0\0\0GET 2048 /var/lib/cfengi"..., 60, 0, NULL, 0) = 60 recvfrom(3, The file 'clean_old_lmq.sh' itself is OK - I can copy it manually and then cfagent will just hang at some other point... I can also delete the file on some other machine and it will get copied without a problem... It looks like a network problem, but there are no problems using 'rsync' or 'scp'. I also noticed that after a cfagent hang,. the server refuses new connections to 'cfservd' from this machine. I have to restart 'cfservd' to make it go again. Any ideas? Danilo _______________________________________________ Help-cfengine mailing list Help-cfengine@cfengine.org https://cfengine.org/mailman/listinfo/help-cfengine