Forum: Cfengine Help
Subject: Cfagent hangs on 'copy'
Author: danci1973
Link to topic: https://cfengine.com/forum/read.php?3,18586,18586#msg-18586

Hi,

I have a bit of a trouble with cfagent hanging when copying files from the 
server.

We use cfengine for years now to manage close to a 1000 machines and it works 
pretty well. Machines are mostly rather old (our own Linux 2.4 based distro), 
but there are also newer, running some other Linux distro (mostly OpenSuSE) 
with 2.6.

We use cfengine-2.1.11, which admittedly is a bit old, but so far it has worked 
flawlessly. We'd like to avoid upgrading all machines, if possible...

Right now I'm introducing a couple of new OpenSuSE 11.2 based machines and I 
got some problems - I can run 'cfagent --DHr00 --no-splay --inform' and it 
starts fine. It copies some files and then it just suddenly hangs...

If I run cfagent with '-d2', this is where it stops:


ExpandVarstring(cfengine.lmq.mercator.si)
ExpandVarstring(/var/lib/cfengine/lmq_bins/clean_old_lmq.sh)
ExpandVarstring(/usr/local/sbin/clean_old_lmq.sh)
Checking copy from 
cfengine.lmq.mercator.si:/var/lib/cfengine/lmq_bins/clean_old_lmq.sh to 
/usr/local/sbin/clean_old_lmq.sh
ExpandVarstring(cfengine.lmq.mercator.si)
Server connection to cfengine.lmq.mercator.si already open on 3
Authentic connection verified
cf_rstat(/var/lib/cfengine/lmq_bins/clean_old_lmq.sh)
GetCachedStatData(/var/lib/cfengine/lmq_bins/clean_old_lmq.sh)
Did not find in cache
Transaction Send
Attempting to send 73 bytes
SendSocketStream, sent 73
RecvSocketStream(8)
    (Concatenated 8 from stream)
Transaction Receive []
RecvSocketStream(69)
    (Concatenated 69 from stream)
Mode = 493,0
OK: type=0
 mode=755
 lmode=0
 uid=0
 gid=0
 size=302
 atime=1286007134
 mtime=1249381041 ino=4097100 nlnk=1, dev=2304
RecvSocketStream(8)
    (Concatenated 8 from stream)
Transaction Receive []
RecvSocketStream(3)
    (Concatenated 3 from stream)
Linkbuffer: OK:
Directory for /usr/local/sbin/clean_old_lmq.sh exists. Okay
CheckImage (source=/var/lib/cfengine/lmq_bins/clean_old_lmq.sh 
destination=/usr/local/sbin/clean_old_lmq.sh)
cf_rstat(/var/lib/cfengine/lmq_bins/clean_old_lmq.sh)
GetCachedStatData(/var/lib/cfengine/lmq_bins/clean_old_lmq.sh)
Found in cache
ImageCopy(/var/lib/cfengine/lmq_bins/clean_old_lmq.sh,/usr/local/sbin/clean_old_lmq.sh,+700,-7077)
ExpandVarstring(cfengine.lmq.mercator.si)
IgnoredOrExcluded(/var/lib/cfengine/lmq_bins/clean_old_lmq.sh)
file /usr/local/sbin/clean_old_lmq.sh class any.Hr00 was not excluded
cfengine:vlmq-0001:: /usr/local/sbin/clean_old_lmq.sh wasn't at destination 
(copying)
cfengine:vlmq-0001:: Copying from 
cfengine.lmq.mercator.si:/var/lib/cfengine/lmq_bins/clean_old_lmq.sh
CopyReg(/var/lib/cfengine/lmq_bins/clean_old_lmq.sh,/usr/local/sbin/clean_old_lmq.sh)
This is a remote copy from server: cfengine.lmq.mercator.si
Transaction Send
Attempting to send 60 bytes
SendSocketStream, sent 60
RecvSocketStream(302)


And it just hangs there indefinitely or until I hit CTRL-C in which case it 
says this:


Ccfengine:vlmq-0001:: Received signal 2 (SIGINT) while doing 
cfengine:vlmq-0001:: Logical start time Sat Oct  2 10:49:32 2010
cfengine:vlmq-0001:: This sub-task started really at Sat Oct  2 10:49:32 2010


If I run the same command with 'strace', I get this:


sendto(3, "t 65\0\0\0\0SYNCH 1286009580 STAT /v"..., 73, 0, NULL, 0) = 73
recvfrom(3, "t 69\0\0\0\0", 8, 0, NULL, NULL) = 8
recvfrom(3, "OK: 0 493 0 0 0 302 1286007134 1"..., 69, 0, NULL, NULL) = 69
recvfrom(3, "t 3\0\0\0\0\0", 8, 0, NULL, NULL) = 8
recvfrom(3, "OK:", 3, 0, NULL, NULL)    = 3
lstat("/usr/local/sbin", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
stat("/usr", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
stat("/usr/local", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
stat("/usr/local/sbin", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat("/usr/local/sbin/clean_old_lmq.sh", 0x7fff0c58d0e0) = -1 ENOENT (No such 
file or directory)
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7f1524a61000
write(1, "cfengine:vlmq-0001:: Copying fro"..., 103cfengine:vlmq-0001:: Copying 
from cfengine.lmq.mercator.si:/var/lib/cfengine/lmq_bins/clean_old_lmq.sh
) = 103
unlink("/usr/local/sbin/clean_old_lmq.sh.cfnew") = 0
open("/usr/local/sbin/clean_old_lmq.sh.cfnew", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 
0600) = 4
sendto(3, "t 52\0\0\0\0GET 2048 /var/lib/cfengi"..., 60, 0, NULL, 0) = 60
recvfrom(3,


The file 'clean_old_lmq.sh' itself is OK - I can copy it manually and then 
cfagent will just hang at some other point... I can also delete the file on 
some other machine and it will get copied without a problem...

It looks like a network problem, but there are no problems using 'rsync' or 
'scp'.

I also noticed that after a cfagent hang,. the server refuses new connections 
to 'cfservd' from this machine. I have to restart 'cfservd' to make it go again.

Any ideas?

 Danilo

_______________________________________________
Help-cfengine mailing list
Help-cfengine@cfengine.org
https://cfengine.org/mailman/listinfo/help-cfengine

Reply via email to