-----BEGIN PGP SIGNED MESSAGE-----

>>>>> "Niall" == Niall O Broin <[EMAIL PROTECTED]> writes:
    >> I backup 7 local systems with Amanda.
    >> 
    >> Three Linux boxes (1 Debian/i386, 1 RH/i386, 1 RH/Netwinder), and four
    >> NetBSD/i386 boxes. There is a NetBSD/ipf firewall between the backup
    >> server (NetBSD/i386) and some of the boxes. Some of the backups also
    >> occur over IPsec (yes, even though they are all "local").
    >> 
    >> Two boxes on the same wire as backup server (plus the server itself)
    >> work flawlessly. The IPsec connected ones work fine.
    >> 
    >> The three behind the firewall fail frequently, but not 100% of the
    >> time.  I setup backups for just those hosts, and watch with
    >> tcpdump. I've built with the appropriate port ranges, but I never seen
    >> firewall failures, yet I get failures.

    Niall> Speak to me brother ! I've been posting about a similar problem
    Niall> here but I've got no responses. Do you get messages like these in
    Niall> the report:

    Niall> serv1 /boot lev 0 FAILED [Request to serv1 timed out.]  serv1 /
    Niall> lev 0 FAILED [Request to serv1 timed out.]

  Bingo. What is the firewall?
  
    Niall> I've a different situation - my failing machines are 2 X 1.2 GHz
    Niall> and 1 x 250MHz. However, my firewall is quite a slow box - I can't
    Niall> reach it now to say exactly. I suspect that the firewall can't
    Niall> handle the load, although I have clients using NFS accessing
    Niall> servers through it. However, NFS as a protocol is good at error
    Niall> recovery so that's probably the answer.

  The firewall is a 233Mhz PII. The load on it is neglible. It has a 3Mb
bridged ethernet ADSL in front of it which is pretty much busy all the time.

    >> My impression is that the failures are because the backup time
    >> estimates take too long and the backup server gives up on them. One
    >> the clients, I don't see any errors in the /tmp/amanda output - it
    >> looks normal to me.

    Niall> At the end of amandad.debug on a failing client I see

    Niall> amandad: sending REP packet: ---- Amanda 2.4 REP HANDLE
    Niall> 002-F8B30708 SEQ 1020382783 OPTIONS maxdumps=1; / 0 SIZE 6929200
    Niall> /boot 0 SIZE 3600 ----

    Niall> amandad: dgram_recv: timeout after 10 seconds amandad: waiting for
    Niall> ack: timeout, retrying amandad: dgram_recv: timeout after 10

  Yeah... I get that as well. But not always.

  One possibility is that the state for the UDP connection is failing. 
I would expect to see something in the firewall logs on this, and I'd expect
to see the 10080 packet on one side of the firewall and not on the other.

  I will test with turning off stateful inspection on the UDP stream and see
what happens.

  If this is the case, then Amanda perhaps needs to do keepalives.

    Niall> Like you, I've RTFM and STFW but to no avail. I didn't get to the
    Niall> the tcpdump stage yet, mind you.

  Thank you for the reply.

]       ON HUMILITY: to err is human. To moo, bovine.           |  firewalls  [
]   Michael Richardson, Sandelman Software Works, Ottawa, ON    |net architect[
] [EMAIL PROTECTED] http://www.sandelman.ottawa.on.ca/ |device driver[
] panic("Just another NetBSD/notebook using, kernel hacking, security guy");  [

-----BEGIN PGP SIGNATURE-----
Version: 2.6.3ia
Charset: latin1
Comment: Finger me for keys

iQCVAwUBPNVtjoqHRg3pndX9AQGllgP/S+j0m0tDguBmF2mXQo4CIZB3Lgr2/4r9
Cqagt4YVlQ0P1QJsxvfPGoulHk06nKjl01lZl85IHokVQ5jtIbAy/b92WuEXXFuN
SO7R5Oq1t2LonlKUYG3oMRuCGp+4a2dkK3//o9ZzWWBakJwX0Ei3/BswjlImZpxB
b1VITDBrfFk=
=VyLf
-----END PGP SIGNATURE-----

Reply via email to