On Sat, May 04, 2002 at 08:02:58PM -0400, Michael Richardson wrote: > I backup 7 local systems with Amanda. > > Three Linux boxes (1 Debian/i386, 1 RH/i386, 1 RH/Netwinder), and > four NetBSD/i386 boxes. There is a NetBSD/ipf firewall between the backup > server (NetBSD/i386) and some of the boxes. Some of the backups also occur > over IPsec (yes, even though they are all "local"). > > Two boxes on the same wire as backup server (plus the server itself) > work flawlessly. The IPsec connected ones work fine. > > The three behind the firewall fail frequently, but not 100% of the time. > I setup backups for just those hosts, and watch with tcpdump. I've built with > the appropriate port ranges, but I never seen firewall failures, yet I get > failures.
Speak to me brother ! I've been posting about a similar problem here but I've got no responses. Do you get messages like these in the report: serv1 /boot lev 0 FAILED [Request to serv1 timed out.] serv1 / lev 0 FAILED [Request to serv1 timed out.] My remote (to describe the machines on the other side of the firewall) backups fail nearly all the time. My boxes are all Linux with large / and small /boot partitions. Sometimes L0 backups of /boot work, and once or twice I got an L0 of / to work (of one client) but generally all that works is when I get L1 of /boot, which is of course tiny. > Coincidentally, the machines that fail are all less than 300Mhz systems, > (233Mhz, 350Mhz, 200Mhz), while the machines that work are 650Mhz+. The > backup server itself, however is a K5-133 running NetBSD/i386, and a lot of > SCSI spindles. (Yeah, it needs to be replaced) I've a different situation - my failing machines are 2 X 1.2 GHz and 1 x 250MHz. However, my firewall is quite a slow box - I can't reach it now to say exactly. I suspect that the firewall can't handle the load, although I have clients using NFS accessing servers through it. However, NFS as a protocol is good at error recovery so that's probably the answer. > My impression is that the failures are because the backup time estimates > take too long and the backup server gives up on them. One the clients, I > don't see any errors in the /tmp/amanda output - it looks normal to me. At the end of amandad.debug on a failing client I see amandad: sending REP packet: ---- Amanda 2.4 REP HANDLE 002-F8B30708 SEQ 1020382783 OPTIONS maxdumps=1; / 0 SIZE 6929200 /boot 0 SIZE 3600 ---- amandad: dgram_recv: timeout after 10 seconds amandad: waiting for ack: timeout, retrying amandad: dgram_recv: timeout after 10 seconds amandad: waiting for ack: timeout, retrying amandad: dgram_recv: timeout after 10 seconds amandad: waiting for ack: timeout, retrying amandad: dgram_recv: timeout after 10 seconds amandad: waiting for ack: timeout, retrying amandad: dgram_recv: timeout after 10 seconds amandad: waiting for ack: timeout, giving up! which is presumably related to the timeout in the mail reports. > I've been through the documentation and the FAQs, and I've watched > tcpdump's of the traffic going through... nothing obvious. Like you, I've RTFM and STFW but to no avail. I didn't get to the the tcpdump stage yet, mind you. Kindest regards, Niall O Broin
