-----BEGIN PGP SIGNED MESSAGE-----
>>>>> "Niall" == Niall O Broin <[EMAIL PROTECTED]> writes:
>> I backup 7 local systems with Amanda.
>>
>> Three Linux boxes (1 Debian/i386, 1 RH/i386, 1 RH/Netwinder), and four
>> NetBSD/i386 boxes. There is a NetBSD/ipf firewall between the backup
>> server (NetBSD/i386) and some of the boxes. Some of the backups also
>> occur over IPsec (yes, even though they are all "local").
>>
>> Two boxes on the same wire as backup server (plus the server itself)
>> work flawlessly. The IPsec connected ones work fine.
>>
>> The three behind the firewall fail frequently, but not 100% of the
>> time. I setup backups for just those hosts, and watch with
>> tcpdump. I've built with the appropriate port ranges, but I never seen
>> firewall failures, yet I get failures.
Niall> Speak to me brother ! I've been posting about a similar problem
Niall> here but I've got no responses. Do you get messages like these in
Niall> the report:
Niall> serv1 /boot lev 0 FAILED [Request to serv1 timed out.] serv1 /
Niall> lev 0 FAILED [Request to serv1 timed out.]
Bingo. What is the firewall?
Niall> I've a different situation - my failing machines are 2 X 1.2 GHz
Niall> and 1 x 250MHz. However, my firewall is quite a slow box - I can't
Niall> reach it now to say exactly. I suspect that the firewall can't
Niall> handle the load, although I have clients using NFS accessing
Niall> servers through it. However, NFS as a protocol is good at error
Niall> recovery so that's probably the answer.
The firewall is a 233Mhz PII. The load on it is neglible. It has a 3Mb
bridged ethernet ADSL in front of it which is pretty much busy all the time.
>> My impression is that the failures are because the backup time
>> estimates take too long and the backup server gives up on them. One
>> the clients, I don't see any errors in the /tmp/amanda output - it
>> looks normal to me.
Niall> At the end of amandad.debug on a failing client I see
Niall> amandad: sending REP packet: ---- Amanda 2.4 REP HANDLE
Niall> 002-F8B30708 SEQ 1020382783 OPTIONS maxdumps=1; / 0 SIZE 6929200
Niall> /boot 0 SIZE 3600 ----
Niall> amandad: dgram_recv: timeout after 10 seconds amandad: waiting for
Niall> ack: timeout, retrying amandad: dgram_recv: timeout after 10
Yeah... I get that as well. But not always.
One possibility is that the state for the UDP connection is failing.
I would expect to see something in the firewall logs on this, and I'd expect
to see the 10080 packet on one side of the firewall and not on the other.
I will test with turning off stateful inspection on the UDP stream and see
what happens.
If this is the case, then Amanda perhaps needs to do keepalives.
Niall> Like you, I've RTFM and STFW but to no avail. I didn't get to the
Niall> the tcpdump stage yet, mind you.
Thank you for the reply.
] ON HUMILITY: to err is human. To moo, bovine. | firewalls [
] Michael Richardson, Sandelman Software Works, Ottawa, ON |net architect[
] [EMAIL PROTECTED] http://www.sandelman.ottawa.on.ca/ |device driver[
] panic("Just another NetBSD/notebook using, kernel hacking, security guy"); [
-----BEGIN PGP SIGNATURE-----
Version: 2.6.3ia
Charset: latin1
Comment: Finger me for keys
iQCVAwUBPNVtjoqHRg3pndX9AQGllgP/S+j0m0tDguBmF2mXQo4CIZB3Lgr2/4r9
Cqagt4YVlQ0P1QJsxvfPGoulHk06nKjl01lZl85IHokVQ5jtIbAy/b92WuEXXFuN
SO7R5Oq1t2LonlKUYG3oMRuCGp+4a2dkK3//o9ZzWWBakJwX0Ei3/BswjlImZpxB
b1VITDBrfFk=
=VyLf
-----END PGP SIGNATURE-----