Yes, an Amanda 3.3.3 server can properly backup an Amanda 2.5 client. I have a number of such in my universe right now.
I need to reinforce something here: Just because your iptables/ufw firewall allows port 10080 UDP to reach into your client and server, that doesn't allow for the TCP connection amdump is going to use to actually transfer the data. Amcheck doesn't check for TCP connectivity so it will claim the configuration is correct. On my Centos 5.11 client, I do this: [root@copper ~]# lsmod | grep amanda ip_conntrack_amanda 8901 0 ip_conntrack 53665 3 ip_conntrack_amanda,ip_conntrack_netbios_ns,xt_state [root@copper sysconfig]# grep amanda /etc/sysconfig/iptables-config IPTABLES_MODULES="ip_conntrack_netbios_ns ip_conntrack_amanda" If you don't have ip_conntrack_amanda in your list of loaded kernel modules, the amdump is probably going to fail. Copper is one of my mission-critical internal servers with public exposure, so it has a very tight iptables firewall configured. I had issues with it failing to backup until I forcibly configured it to load ip_conntrack_amanda. It would be a very useful addition to amcheck if it could somehow test that the client and server can actually create an amdump TCP connection, because it would illuminate the need to configure ip_conntrack_amanda or lf_conntrack_amanda. > -----Original Message----- > From: Seann [mailto:[email protected]] > Sent: Thursday, September 24, 2015 06:35 PM > To: Debra S Baddorf <[email protected]>; Joi L. Ellis > <[email protected]> > Cc: [email protected] > Subject: Re: Upgrade woes and eternal hanging of dumps > > On 9/21/2015 11:36 AM, Debra S Baddorf wrote: > > YES! I agree with the first and third of these tidbits. I just > couldn't remember them. I've had issues with both of them. Including the > tricky firewall timeout part, in Idea Three. > > > > Here's hoping you have a network person who can add some skills or ideas > at that level. Or, just don't do client estimates, as in the first > suggested fix. > > > > I think we had to allow trusted clients to initiate their OWN > connections back to the server (via a firewall rule), so that they could > still talk to the server even after that server-created conversation had > timed out. That might count as fix #3, but it takes firewall skills. > That might be a slightly different problem/situation (it sounds a little > different) but I think it's in this same category, somewhere. Network > savvy people, can you translate my "generic English" description into what > we actually did? > > Deb Baddorf, Fermilab > > > > > > On Sep 21, 2015, at 10:25 AM, Joi L. Ellis <[email protected]> > wrote: > > > >> I've just read through the long thread prompted by this particular > post. I'd like to offer a few points I didn't see mentioned before... > >> > >> Idea one: You upgraded from 2.5 to 3.3. 2.5 amdump only spoke UDP with > a 'bsd' auth protocol, so that was the only action available. Thus, > inetd.conf didn't specify an -auth=bsd parameter. 3.3 defaults to - > auth=bsdtcp if you don't provide it. Does your new configuration specify > that those clients must be reached with -auth=bsd from the new server, > rather than the server's new default of -auth=bsdctp? > >> > >> Idea two: If any of the involved machines are running iptables or ufw > firewalls, verify the new configuration is still loading the correct > kernel modules. At one point the /etc/default/ufw.conf file named kernel > modules incorrectly after an upgrade, and/or the nf_conntrack_amanda > module itself went missing. (Some kernels change the name of this module, > usually it's the first two characters.) The symptom here is that amcheck > thinks everything is fine, yet the actual amdump process fails because the > UDP control conversation between the server and the client is allowed, but > the TCP data stream amdump uses with -auth=bsdtcp is blocked. > >> > >> Idea Three: I run an Amanda 3.3.3 server, and I have experienced a > similar problem to your own. I've tried posting about it here in the past > and got null response, so I gave up asking for help and figured out my own > workarounds. > >> > >> My amanda server is behind a corporate firewall, and some of the > clients are in the DMZ, outside the firewall... and they are running > amanda 2.5 due to the age of the client hosts. I've had repeated issues > with the corporate firewall interfering with the planner. > >> > >> The issue I see is that the amanda server planner fires off a UDP > "connection" to the client, asking the client to provide estimates. The > client does so... BUT. That blasted firewall has created a dynamic NAT > rule that will allow the client to send back its UDP response. IF the > client's response doesn't appear before the NAT rule expires, the planner > falls into a permanent wait state, waiting for a UDP response that will > never arrive because the firewall has blocked it. The client has no idea > it failed, and its logs look entirely normal. > >> > >> If you dig into the server's logs, you will probably find TIMEOUT > errors in the logs from the planner. I don't have any recent logs that > illustrate this error, so I can't quote an example. > >> > >> I worked around this in two ways (varies with the client situation:) > >> > >> *) tell amanda to not use the client to create the estimate at all > >> *) adjust the NAT timeout rules on the firewall to extend the > timeout. As I recall, it was initially set to 120 seconds. We moved it > up to 300 seconds at one point, but then began to experience issues with > the firewall filling memory tables because rules weren't timing out fast > enough. > >> > >> As I see it, the planner makes the (unsafe) assumption that IF its > initial request-an-estimate packets traveled properly, the response will > always do so. If there is a firewall involved, the response might get > lost, yet the planner will sit there forever, twiddling its thumbs and not > backing up anything, until it receives the missing estimates package back > from the client. > >> > >> To summarize, I suspect that the move from 2.5's UDP-only communication > style to 3.3's default TCP-only style has broken something in your > environment that you've overlooked. Either the server, the clients (or > both) or a firewall (either an external network firewall, or a kernel > firewall on one of the hosts involved) are breaking your planner. I've > experienced very similar symptoms after version upgades. > >> > >> (And yes, I've seen my issues disappear when jobs are run manually, > >> yet still fail when run over night. Manual tests don't trigger the > >> firewall issues because the windows I have open the to the client and > >> server keep the darn firewall from timing out the dynamic NAT rules.) > >> > >> > >>> -----Original Message----- > >>> From: [email protected] > >>> [mailto:[email protected]] > >>> On Behalf Of Seann > >>> Sent: Monday, August 17, 2015 02:34 PM > >>> To: [email protected] > >>> Subject: Upgrade woes and eternal hanging of dumps > >>> > >>> All, > >>> > >>> I am looking for a little direction on a problem that has cropped up > >>> for me recently. > >>> > >>> I have a backup set, that was created using Amanda 2.5 (default on > >>> CentOS > >>> 5.11) and ran very well, both manually and from the cron job I had > >>> set for it. > >>> It has approximately 13 hosts to backup, from as simple as backing > >>> up a single directory, to backing up the full system, and it ran > >>> with no issues on CentOS 5.11. > >>> The basic setup is using hard drives as the backup media, > >>> compressing the backups to save space, using server compression, > >>> these also use GNU-TAR as the archive format. > >>> > >>> Fast forward to today, I have the system upgraded to CentOS 7, which > >>> also upgraded to Amanda 3.3.3-13, and after some configuration file > >>> re-writing, I got most of the backups to work. > >>> Two systems, one backing up the web directory, the other backing up > >>> the full disk, fail constantly. > >>> When these two disklist statements are removed, the backup runs, and > >>> takes approximately 2 and a half hours to run on the 8 other hosts > >>> (the other 3 hosts are currently offline and not in scope). > >>> > >>> When the CRON job kicks off at midnight, it runs for over 12 hours > >>> (I have the etimeout set to one day, as the planner kept dying > >>> saying to timed out). > >>> This is the same basic error that I get with the two above mentioned > >>> failing backups. > >>> > >>> When the hung backup job is running, I see the dumpers and main dump > >>> process running on the backup server, but nothing in the logs > >>> outside of the "We started the backup job" type of log messages. > >>> On all of the hosts, I don't see the client running, nor to I see > >>> any TAR processes running. > >>> There are also no clues in the logs on which host the server is > >>> waiting on, and checking all the hosts in scope show they are all in > >>> the same state, that is they have sent the estimate to the backup > >>> server and are waiting on the next phase. > >>> > >>> > >>> Any help on this would be appreciated, and also is there a better > >>> way of making sense of the logs (such as using something like > >>> Graylog2?), and on reporting for issues with Amanda 3.3? > >>> > >>> > >>> Regards, > >>> Seann > It has been a while since I updated the few folk who have been following > this thread. > > I haven't had much time to tinker with this, as it is on my home network, > and my day job got in the way. > > To answer the questions from Joi: > 1. Yes, I swapped the auth in the global section to use bsd for all of my > hosts, unless overridden in the disk definition. > 2. 70% of the clients don't run a firewall, and those that do, allow the > UDP ports for Amanda 10080/udp by default. > 3. All my clients are behind the same firewall as the Amanda server, so no > firewall outside of the host firewalls are in play with this. > > Reading a few of the other threads on the list, there was the mention of > xinetd being depreciated, and since my setup wasn't working in the first > place, I flipped everything over to ssh. > > I used puppet to push SSH keys to the proper user's on the client > machines, went through and ssh'd to each host by FQDN and accepted the > host keys, and ensured the users worked on the client machines. > > When running amcheck on them now, what I get is this: > > WARNING: server1.tsukinokage.net: selfcheck request failed: EOF on read > from server1.tsukinokage.net. > > To save some reading space, I have the log output near the end of the > thread. > > Searching for anything on this has been nearly impossible, namely as to > what the hel the 'EOF on read" is related to, and troubleshooting steps > > > Ultimately I am stuck in a chicken and the egg situation, where I need > to back up directories on some servers, to the backup server, prior to > upgrading their operating systems from CentOs 5, to CentOs 7, and > redeploying their configurations, but I can't right now because the 2.5 > clients aren't working. > > This weekend, now that I am not on-call, I might tinker with compiling, > from scratch, a 3.3x client on some of those servers, in order to test > if it really is that, while not said outright, Amanda 3.3.3 is not > backwards compatible. > > The clients log shows: > amandad: debug 1 pid 12389 ruid 33 euid 33: start at Thu Sep 24 18:08:33 > 2015 > amandad: version 2.5.0p2 > amandad: build: VERSION="Amanda-2.5.0p2" > amandad: BUILT_DATE="Thu Feb 23 08:03:44 EST 2012" > amandad: BUILT_MACH="Linux builder10.centos.org 2.6.18-53.el5 #1 > SMP Mon Nov 12 02:14:55 EST 2007 x86_64 x86_64 x86_64 GNU/Linux" > amandad: CC="gcc" > amandad: CONFIGURE_COMMAND="'./configure' > '--build=x86_64-redhat-linux-gnu' '--host=x86_64-redhat-linux-gnu' > '--target=x86_64-redhat-linux-gnu' '--program-prefix=' '--prefix=/usr' > '--exec-prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' > '--sysconfdir=/etc' '--datadir=/usr/share' '--includedir=/usr/include' > '--libdir=/usr/lib64' '--libexecdir=/usr/lib64/amanda' > '--localstatedir=/var/lib' '--sharedstatedir=/usr/com' > '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--enable-shared' > '--disable-static' '--disable-dependency-tracking' > '--with-index-server=amandahost' '--with-tape-server=amandahost' > '--with-config=DailySet1' > '--with-gnutar-listdir=/var/lib/amanda/gnutar-lists' > '--with-smbclient=/usr/bin/smbclient' > '--with-dumperdir=/usr/lib64/amanda/dumperdir' '--with-amandahosts' > '--with-user=amanda' '--with-group=disk' '--with-tmpdir=/var/log/amanda' > '--with-gnutar=/bin/tar' '--with-ssh-security'" > amandad: paths: bindir="/usr/bin" sbindir="/usr/sbin" > amandad: libexecdir="/usr/lib64/amanda" mandir="/usr/share/man" > amandad: AMANDA_TMPDIR="/var/log/amanda" > amandad: AMANDA_DBGDIR="/var/log/amanda" CONFIG_DIR="/etc/amanda" > amandad: DEV_PREFIX="/dev/" RDEV_PREFIX="/dev/r" > amandad: DUMP="/sbin/dump" RESTORE="/sbin/restore" VDUMP=UNDEF > amandad: VRESTORE=UNDEF XFSDUMP=UNDEF XFSRESTORE=UNDEF VXDUMP=UNDEF > amandad: VXRESTORE=UNDEF SAMBA_CLIENT="/usr/bin/smbclient" > amandad: GNUTAR="/bin/tar" COMPRESS_PATH="/bin/gzip" > amandad: UNCOMPRESS_PATH="/bin/gzip" LPRCMD="/usr/bin/lpr" > amandad: MAILER="/usr/bin/Mail" > amandad: listed_incr_dir="/var/lib/amanda/gnutar-lists" > amandad: defs: DEFAULT_SERVER="amandahost" DEFAULT_CONFIG="DailySet1" > amandad: DEFAULT_TAPE_SERVER="amandahost" > amandad: DEFAULT_TAPE_DEVICE="null:" HAVE_MMAP HAVE_SYSVSHM > amandad: LOCKING=POSIX_FCNTL SETPGRP_VOID DEBUG_CODE > amandad: AMANDA_DEBUG_DAYS=4 BSD_SECURITY RSH_SECURITY > USE_AMANDAHOSTS > amandad: CLIENT_LOGIN="amanda" FORCE_USERID HAVE_GZIP > amandad: COMPRESS_SUFFIX=".gz" COMPRESS_FAST_OPT="--fast" > amandad: COMPRESS_BEST_OPT="--best" UNCOMPRESS_OPT="-dc" > amandad: time 0.000: accept recv REQ pkt: > <<<<< > SERVICE noop > OPTIONS features=ffffffff9efefbffffffffff1f; > >>>>> > amandad: time 0.000: creating new service: /usr/lib64/amanda/noop > OPTIONS features=ffffffff9efefbffffffffff1f; > > amandad: time 0.000: sending ACK pkt: > <<<<< > >>>>> > amandad: time 0.001: sending REP pkt: > <<<<< > OPTIONS features=fffffeff9ffeffff07; > >>>>> > amandad: time 0.040: received ACK pkt: > <<<<< > >>>>> > amandad: time 30.038: pid 12389 finish time Thu Sep 24 18:09:03 2015 > > The server will check itself, and come back with no problems. > The amandad log file on the server, for the client check to localhost > shows: > > > Thu Sep 24 18:14:58 2015: thd-0x7f0087613400: amandad: pid 27610 ruid 33 > euid 33 version 3.3.3: start at Thu Sep 24 18:14:58 2015 > Thu Sep 24 18:14:58 2015: thd-0x7f0087613400: amandad: > security_getdriver(name=ssh) returns 0x7f00865ac660 > Thu Sep 24 18:14:58 2015: thd-0x7f0087613400: amandad: version 3.3.3 > Thu Sep 24 18:14:58 2015: thd-0x7f0087613400: amandad: build: > VERSION="Amanda-3.3.3" > Thu Sep 24 18:14:58 2015: thd-0x7f0087613400: amandad: BUILT_DATE="Tue > Jun 10 01:33:40 UTC 2014" BUILT_MACH="" > Thu Sep 24 18:14:58 2015: thd-0x7f0087613400: amandad: BUILT_REV="5099" > BUILT_BRANCH="community_3_3_3" CC="gcc" > Thu Sep 24 18:14:58 2015: thd-0x7f0087613400: amandad: paths: > bindir="/usr/bin" sbindir="/usr/sbin" > Thu Sep 24 18:14:58 2015: thd-0x7f0087613400: amandad: > libexecdir="/usr/lib64" amlibexecdir="/usr/lib64/amanda" > Thu Sep 24 18:14:58 2015: thd-0x7f0087613400: amandad: > mandir="/usr/share/man" AMANDA_TMPDIR="/var/log/amanda" > Thu Sep 24 18:14:58 2015: thd-0x7f0087613400: amandad: > AMANDA_DBGDIR="/var/log/amanda" CONFIG_DIR="/etc/amanda" > Thu Sep 24 18:14:58 2015: thd-0x7f0087613400: amandad: > DEV_PREFIX="/dev/" RDEV_PREFIX="/dev/" DUMP=UNDEF > Thu Sep 24 18:14:58 2015: thd-0x7f0087613400: amandad: RESTORE=UNDEF > VDUMP=UNDEF VRESTORE=UNDEF XFSDUMP=UNDEF > Thu Sep 24 18:14:58 2015: thd-0x7f0087613400: amandad: XFSRESTORE=UNDEF > VXDUMP=UNDEF VXRESTORE=UNDEF > Thu Sep 24 18:14:58 2015: thd-0x7f0087613400: amandad: > SAMBA_CLIENT="/usr/bin/smbclient" GNUTAR="/bin/tar" > Thu Sep 24 18:14:58 2015: thd-0x7f0087613400: amandad: > COMPRESS_PATH="/usr/bin/gzip" > Thu Sep 24 18:14:58 2015: thd-0x7f0087613400: amandad: > UNCOMPRESS_PATH="/usr/bin/gzip" LPRCMD=UNDEF MAILER=UNDEF > Thu Sep 24 18:14:58 2015: thd-0x7f0087613400: amandad: > listed_incr_dir="/var/lib/amanda/gnutar-lists" > Thu Sep 24 18:14:58 2015: thd-0x7f0087613400: amandad: defs: > DEFAULT_SERVER="amandahost" DEFAULT_CONFIG="DailySet1" > Thu Sep 24 18:14:58 2015: thd-0x7f0087613400: amandad: > DEFAULT_TAPE_SERVER="amandahost" DEFAULT_TAPE_DEVICE="" > Thu Sep 24 18:14:58 2015: thd-0x7f0087613400: amandad: NEED_STRSTR > AMFLOCK_POSIX AMFLOCK_FLOCK AMFLOCK_LOCKF > Thu Sep 24 18:14:58 2015: thd-0x7f0087613400: amandad: AMFLOCK_LNLOCK > SETPGRP_VOID AMANDA_DEBUG_DAYS=4 BSD_SECURITY > Thu Sep 24 18:14:58 2015: thd-0x7f0087613400: amandad: KRB5_SECURITY > RSH_SECURITY USE_AMANDAHOSTS > Thu Sep 24 18:14:58 2015: thd-0x7f0087613400: amandad: > CLIENT_LOGIN="amandabackup" CHECK_USERID HAVE_GZIP > Thu Sep 24 18:14:58 2015: thd-0x7f0087613400: amandad: > COMPRESS_SUFFIX=".gz" COMPRESS_FAST_OPT="--fast" > Thu Sep 24 18:14:58 2015: thd-0x7f0087613400: amandad: > COMPRESS_BEST_OPT="--best" UNCOMPRESS_OPT="-dc" > Thu Sep 24 18:14:58 2015: thd-0x7f0087613400: amandad: parsing > 192.168.10.19 > Thu Sep 24 18:14:58 2015: thd-0x7f0087613400: amandad: > security_handleinit(handle=0x7f008761ded0, driver=0x7f00865ac660 (SSH)) > Thu Sep 24 18:14:58 2015: thd-0x7f0087613400: amandad: > security_streaminit(stream=0x7f008761e0e0, driver=0x7f00865ac660 (SSH)) > Thu Sep 24 18:14:58 2015: thd-0x7f0087613400: amandad: authenticated > peer name is 'amanda.tsukinokage.net' > Thu Sep 24 18:14:58 2015: thd-0x7f0087613400: amandad: accept recv REQ > pkt: > <<<<< > SERVICE noop > OPTIONS features=ffffffff9efefbffffffffff1f; > >>>>> > Thu Sep 24 18:14:58 2015: thd-0x7f0087613400: amandad: creating new > service: noop > OPTIONS features=ffffffff9efefbffffffffff1f; > > Thu Sep 24 18:14:58 2015: thd-0x7f0087613400: amandad: sending ACK pkt: > <<<<< > >>>>> > Thu Sep 24 18:14:58 2015: thd-0x7f0087613400: amandad: tcpm_send_token: > data is still flowing > Thu Sep 24 18:14:59 2015: thd-0x7f0087613400: amandad: sending REP pkt: > <<<<< > OPTIONS features=ffffffff9efefbffffffffff1f; > >>>>> > Thu Sep 24 18:14:59 2015: thd-0x7f0087613400: amandad: received ACK pkt: > <<<<< > >>>>> > Thu Sep 24 18:14:59 2015: thd-0x7f0087613400: amandad: > security_close(handle=0x7f008761ded0, driver=0x7f00865ac660 (SSH)) > Thu Sep 24 18:14:59 2015: thd-0x7f0087613400: amandad: > security_stream_close(0x7f008761e0e0) > Thu Sep 24 18:14:59 2015: thd-0x7f0087613400: amandad: > security_handleinit(handle=0x7f008761ded0, driver=0x7f00865ac660 (SSH)) > Thu Sep 24 18:14:59 2015: thd-0x7f0087613400: amandad: > security_streaminit(stream=0x7f00876263e0, driver=0x7f00865ac660 (SSH)) > Thu Sep 24 18:14:59 2015: thd-0x7f0087613400: amandad: authenticated > peer name is 'amanda.tsukinokage.net' > Thu Sep 24 18:14:59 2015: thd-0x7f0087613400: amandad: accept recv REQ > pkt: > <<<<< > SERVICE selfcheck > OPTIONS > features=ffffffff9efefbffffffffff1f;maxdumps=1;hostname=amanda.tsukinokage > .net;config=Tsukinokage-daily; > <dle> > <program>GNUTAR</program> > <estimate>CLIENT </estimate> > <disk>/var/database</disk> > <auth>ssh</auth> > <compress>SERVER-BEST</compress> > <record>YES</record> > <index>YES</index> > <datapath>AMANDA</datapath> > </dle> > <dle> > <program>GNUTAR</program> > <estimate>CLIENT </estimate> > <disk>/etc</disk> > <auth>ssh</auth> > <compress>SERVER-BEST</compress> > <record>YES</record> > <index>YES</index> > <datapath>AMANDA</datapath> > </dle> > >>>>> > Thu Sep 24 18:14:59 2015: thd-0x7f0087613400: amandad: creating new > service: selfcheck > OPTIONS > features=ffffffff9efefbffffffffff1f;maxdumps=1;hostname=amanda.tsukinokage > .net;config=Tsukinokage-daily; > <dle> > <program>GNUTAR</program> > <estimate>CLIENT </estimate> > <disk>/var/database</disk> > <auth>ssh</auth> > <compress>SERVER-BEST</compress> > <record>YES</record> > <index>YES</index> > <datapath>AMANDA</datapath> > </dle> > <dle> > <program>GNUTAR</program> > <estimate>CLIENT </estimate> > <disk>/etc</disk> > <auth>ssh</auth> > <compress>SERVER-BEST</compress> > <record>YES</record> > <index>YES</index> > <datapath>AMANDA</datapath> > </dle> > > Thu Sep 24 18:14:59 2015: thd-0x7f0087613400: amandad: sending ACK pkt: > <<<<< > >>>>> > Thu Sep 24 18:14:59 2015: thd-0x7f0087613400: amandad: sending REP pkt: > <<<<< > OK version 3.3.3 > OK distro RPM > OK platform CentOS Linux release 7.1.1503 (Core) > OPTIONS features=ffffffff9efefbffffffffff1f; > OK /var/database > OK /var/database > OK /var/database > OK /etc > OK /etc > OK /etc > OK /usr/lib64/amanda/runtar executable > OK /bin/tar executable > OK /var/lib/amanda/gnutar-lists/. read/writable > OK /dev/null read/writable > OK /var/log/amanda has more than 64KB available. > OK /var/log/amanda has more than 64KB available. > OK /etc has more than 64KB available. > >>>>> > Thu Sep 24 18:14:59 2015: thd-0x7f0087613400: amandad: received ACK pkt: > <<<<< > >>>>> > Thu Sep 24 18:14:59 2015: thd-0x7f0087613400: amandad: > security_close(handle=0x7f008761ded0, driver=0x7f00865ac660 (SSH)) > Thu Sep 24 18:14:59 2015: thd-0x7f0087613400: amandad: > security_stream_close(0x7f00876263e0) > Thu Sep 24 18:14:59 2015: thd-0x7f0087613400: amandad: pid 27610 finish > time Thu Sep 24 18:14:59 2015 > > > -- > > Regards, > Seann >
