Hi. I have amanda 2.4.1p1 on a RedHat Linux/Intel for the tape host.
The tape host, spinoza, is also getting backed up and it also runs samba
so it handles backing up the PCs.
I had a working configuration, and then I added a lot of PCs to the
disklist file and amanda stopped working, partly anyway. I am not
worried about freya, spiff, tucker, owlglass, or umtanum. I am very
worried about spinoza.
I am testing amcheck by running the inetd from a terminal with the -d
switch, and I see
spinoza!root 506# inetd -d
ADD : amandaidx proto=tcp, wait.max=0.40, user.group=backup.(null)
builtin=0 server=/usr/local/libexec/amindexd
ADD : amidxtape proto=tcp, wait.max=0.40, user.group=backup.(null)
builtin=0 server=/usr/local/libexec/amidxtaped
ADD : amanda proto=udp, wait.max=1.40, user.group=backup.(null)
builtin=0 server=/g/rcs/sw/libexec/amandad
launching: amanda
pid 2792: exec /g/rcs/sw/libexec/amandad
uid: 500 gid: 257
groups: 257
However, if I repeat this after waiting 30 seconds, inetd doesn't launch
amandad again.
spinoza!backup 19# amcheck -c daily
Amanda Backup Client Hosts Check
--------------------------------
send req failed: Connection refused
amcheck-clients: send ack failed: Connection refused
(brought to you by Amanda 2.4.1p1)
spinoza!backup 20# amcheck -c daily
Amanda Backup Client Hosts Check
--------------------------------
send req failed: Connection refused
send req failed: Connection refused
protocol packet receive: Connection refused
ERROR: owlglass: [access as lists not allowed from
[EMAIL PROTECTED]]
protocol packet receive: Connection refused
protocol packet receive: Connection refused
protocol packet receive: Connection refused
protocol packet receive: Connection refused
protocol packet receive: Connection refused
protocol packet receive: Connection refused
protocol packet receive: Connection refused
WARNING: spinoza: selfcheck request timed out. Host down?
WARNING: beagle: selfcheck request timed out. Host down?
WARNING: calvin: selfcheck request timed out. Host down?
WARNING: hobbes: selfcheck request timed out. Host down?
WARNING: kant: selfcheck request timed out. Host down?
WARNING: calcium: selfcheck request timed out. Host down?
WARNING: myosin: selfcheck request timed out. Host down?
WARNING: fhd: selfcheck request timed out. Host down?
WARNING: freya: selfcheck request timed out. Host down?
WARNING: spiff: selfcheck request timed out. Host down?
WARNING: cowiche: selfcheck request timed out. Host down?
WARNING: graham: selfcheck request timed out. Host down?
WARNING: jensen: selfcheck request timed out. Host down?
WARNING: tucker: selfcheck request timed out. Host down?
WARNING: umtanum: selfcheck request timed out. Host down?
Client check: 25 hosts checked in 29.935 seconds, 16 problems found.
(brought to you by Amanda 2.4.1p1)
spinoza!backup 21#
My hypothesis is that there is some sort of time dependency in amandad
When I finished writing the above, I reran the amcheck command again and
I got:
spinoza!backup 21# amcheck -c daily
Amanda Backup Client Hosts Check
--------------------------------
send req failed: Connection refused
send req failed: Connection refused
protocol packet receive: Connection refused
ERROR: owlglass: [access as lists not allowed from
[EMAIL PROTECTED]]
send req failed: Connection refused
protocol packet receive: Connection refused
send req failed: Connection refused
send req failed: Connection refused
protocol packet receive: Connection refused
WARNING: spinoza: selfcheck request timed out. Host down?
WARNING: freya: selfcheck request timed out. Host down?
WARNING: spiff: selfcheck request timed out. Host down?
WARNING: tucker: selfcheck request timed out. Host down?
WARNING: umtanum: selfcheck request timed out. Host down?
Client check: 25 hosts checked in 30.459 seconds, 6 problems found.
(brought to you by Amanda 2.4.1p1)
spinoza!backup 22#
Now, I can look at the daemon:
spinoza!root 506# inetd -d
ADD : amandaidx proto=tcp, wait.max=0.40, user.group=backup.(null)
builtin=0 server=/usr/local/libexec/amindexd
ADD : amidxtape proto=tcp, wait.max=0.40, user.group=backup.(null)
builtin=0 server=/usr/local/libexec/amidxtaped
ADD : amanda proto=udp, wait.max=1.40, user.group=backup.(null)
builtin=0 server=/g/rcs/sw/libexec/amandad
launching: amanda
pid 2792: exec /g/rcs/sw/libexec/amandad
uid: 500 gid: 257
groups: 257
And I can do a ps -l on the PID 2792
spinoza!root 283# ps -l 2792
F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
100 S 500 2792 2789 0 60 0 - 459 do_sel ? 0:00
amandad
spinoza!root 284#
I think what is happening is that the amandad is starting and then
hanging for some reason - perhaps an automounter problem (we've been
having a lot of those, lately)? For ggigles and grins, I tried killing
PID 2792 and then I reran amcheck again and it seems to work better....
spinoza!root 506# inetd -d
ADD : amandaidx proto=tcp, wait.max=0.40, user.group=backup.(null)
builtin=0 server=/usr/local/libexec/amindexd
ADD : amidxtape proto=tcp, wait.max=0.40, user.group=backup.(null)
builtin=0 server=/usr/local/libexec/amidxtaped
ADD : amanda proto=udp, wait.max=1.40, user.group=backup.(null)
builtin=0 server=/g/rcs/sw/libexec/amandad
launching: amanda
pid 2792: exec /g/rcs/sw/libexec/amandad
uid: 500 gid: 257
groups: 257
pid 2792, exit status f
restored amanda, fd 6
launching: amanda
pid 2863: exec /g/rcs/sw/libexec/amandad
uid: 500 gid: 257
groups: 257
pid 2863, exit status 0
restored amanda, fd 6
No... it ran fine for 2 runs when I moved a good fraction of the PC
backups to another samba server, but then when I restored them and reran
amcheck it timed out again. Another hypothesis is that the amcheck is
expecting a response from the amandad - if the amandad can't get its
work done in time, amcheck assumes it is down and stops. Then, when the
amandad does get its work done, it has nobody to talk to so it just sits
there and sulks. Because the amandad hasn't returned, the inetd won't
let go of the fd so it rejects any further inbound connections on
amanda's UDP port. Does that make sense?
Any advice?
Jeff
I could really use some advice here.
--
Jeff Silverman, sysadmin for the Research Computing Systems (RCS)
University of Washington, School of Engineering, Electrical Engineering Dept.
Box 352500, Seattle, WA, 98125-2500 FAX: (206) 221-5264 Phone (206) 543-9378
[EMAIL PROTECTED] http://rcs.ee.washington.edu/~jeffs