ok maybe I did not explained correctly what I found out and keepalive is
possibly not the right term for it.
this is my scenario:
Amanda server ------------------> CISCO router doing
NAT ----------------> Server to backup (A)
I did a tcpdump und the amanda server and another one on server A. What I
see is:
Amanda server sends a udp paket to server it ack it and starts sendsize, no
more upd packets are sent while sendsize runs. Send size terminates, server
A sends a udp packet to the amanda server with the results and the backup
begins. Inevitably if sendsize takes longer than 6 minutes the backup fails,
now with nat udp timeout 3600 the backup always works.
In client-src/amandad.c the client answers wit a ACK to every P_REQ it
recieves, there is also this interesting comment:
/*
* Under normal conditions, the master will resend the REQ packet
* to be sure we are still alive. It expects an ACK back right
away.
*
* XXX- Arguably we should parse and security check the new packet,
* only sending an ACK if it passes and the request is identical to
* the original one. However, that's too much work for now. :-)
*
* It should suffice to ACK whenever the sender is identical.
*/
Now this is keeping the connection alive and it works also over nat until
there is a pause longer than 6 minutes between P_REQ and ACK. setting a
bigger ACK_TIMEOUT in client-src/amandad.c doesn't help because the timeout
takes place on the router.
The problem is that UDP is stateless as you know, modern routers do
something called connection tracking to permit UDP connection over
NAT.(http://www.cs.princeton.edu/~jns/security/iptables/iptables_conntrack.h
tml)
so when the amanda server sends the first UDP packet to server A the NAT
router saves this information in its tables (until they time out) and when
server A acknowledges the request is able to forward it to the right server.
To allow safe backups over NAT using UDP the server and the client should
exchange P_REQ and ACK every 2 minutes or something like that just to keep
the connection tracking timeout happy!!
----- Original Message -----
From: "Lee Fellows" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Wednesday, July 03, 2002 7:02 PM
Subject: Re: syncronize backups
> On Wed, 2002-07-03 at 10:47, Scaglione Ermanno wrote:
> > thanks but I found the real problem: there is a cisco router doing NAT
> > between the amanda server and the two server. I found that it had a nat
udp
> > timeout of 6 minutes (CISCO default). What happens is that amanda gets
the
> > estimates from the first server that answers and starts the backup on
that
> > server, then the link becomes too busy, the UDP connection with the
second
> > server times out on the router and the second backup doesn't work ......
> > Well we could consider this an amanda bug maybe, the problem is that
amanda
> > doesn't do keepalive while sendsize runs, and sendsize itself doesn't do
> > keepalive. I have been told that most CISCO routers doing NAT have this
6
> > minutes nat udp timeout thus if the estimate takes more than 6 minutes
the
> > backup will probably fails if there is such a router in between .....
> >
> > ----- Original Message -----
> > From: "Chris Marble" <[EMAIL PROTECTED]>
> > To: "Scaglione Ermanno" <[EMAIL PROTECTED]>
> > Sent: Wednesday, July 03, 2002 4:39 AM
> > Subject: Re: syncronize backups
> >
> >
> > > Scaglione Ermanno wrote:
> > > >
> > > > I have a strange problem with amanda, I am backing up 6 servers with
the
> > > > same disklist and 2 of them alternatively fails with a timeout in
> > sendsize.
> > > > Server A works for a couple of days and server B doesn't in the same
> > days,
> > > > then server B works and server A doesn't. I suppose the reason is
that
> > both
> > > > server are behind a slow link. How can I tell Amanda not to backup
> > > > simultaneously the two servers using 1 disklist?
> > >
> > > Hmm, I was going to tell you to lie to Amanda and tell it that all the
> > disks
> > > are on the same spindle. But then I realized that the spindle
parameter
> > is
> > > on a per-machine basis. You could set an earlier starttime for those
> > servers and
> > > set inparallel to 1 just for a while.
> > > --
> > > [EMAIL PROTECTED] - HMC UNIX Systems Manager
> > >
> >
> >
>
>
> Interesting. Do I understand your suggestion correctly to imply that
> keepalive be done on udp sockets. If so, might I suggest that this
> is not the solution. Its been several years since I programmed tcp
> and udp clients and servers, but if memory serves me correctly,
> keepalive is only available in the tcp-family of protocols. Keepalive
> was intended to maintain an otherwise idle connection for some period
> of time. Udp by definition is not a connected protocol and such an
> option would be meaningless for it.
>
>
>
>