Hi,
This is my first post to the amanda-users list, I'm hoping the community
can help me resolve an issue that has rendered our backup-set 90%
ineffective. In short, I've got about 65 DLEs that are not being backed
up, 60 of which reside on a remote host and 1 that is on the same machine
as amandabackup. Amdump runs nightly and completes backup of other DLEs
properly, amreport indicates the failing DLEs are 1kB to 10kB in size
regardless of the backup level assigned to them each run.
An example of an amreport that includes this failure:
DUMP SUMMARY:
DUMPER STATS
TAPER STATS
HOSTNAME DISK L ORIG-kB OUT-kB COMP% MMM:SS KB/s
MMM:SS KB/s
-------------------------------- ----------------------- --------------
-------------
10.6.1.209 /mnt/IT 0 10 10 -- 0:00 339.2
0:02 0.0
10.6.1.209 /mnt/aeiland 0 10 10 -- 0:00 307.3
0:02 0.0
10.6.1.209 /mnt/applications 0 10 10 -- 0:00 544.7
0:02 0.0
10.6.1.209 /mnt/archive-epp 0 10 10 -- 0:00 687.4
0:02 0.0
10.6.1.209 /mnt/aswoboda 0 10 10 -- 0:00 642.8
0:02 0.0
10.6.1.209 /mnt/bmcfarlane 0 10 10 -- 0:00 632.2
0:02 0.0
10.6.1.209 /mnt/bmenzie 0 10 10 -- 0:00 669.3
0:02 0.0
10.6.1.209 /mnt/brodriguez 0 10 10 -- 0:00 107.2
0:02 0.0
10.6.1.209 /mnt/cczinski 0 10 10 -- 0:00 641.0
0:02 0.0
Some history:
We've used this backup-set for ~2 years, adjusting DLEs as necessary for
organizational changes, and it has functioned as expected. In early
November the machine that hosts amanda-server experienced a RAID5 array
failure that necessitated we rebuild the array from scratch. The initial
array used an EXT filesystem, the new array is formatted ZFS. After
recovering from this hardware failure amdump seemed to operate normally for
a couple of weeks... and then things started to fall apart.
Initially I noticed a notification in the nightly amreport that our holding
disk was missing as we had failed to recreate the proper directory
structure after rebuilding the failed array. After creating the missing
directory that error was no longer present in the logs. Shortly thereafter
the bulk of our DLEs started being dumped incorrectly and I've been unable
to determine why. I'm hoping the list will be able to provide me with some
insight as to what triggered the error and assist me in restoring our
backup capabilities.
Here are our config files:
*AMANDA.CONF: *http://kickasspastes.com/4946/
*DISKLIST: *http://kickasspastes.com/4947/
All of the DLEs on host 10.6.1.209 (a FreeNAS box) are failing.
One of four DLEs on localhost is failing (/srv/backups).
Here is a view of file attributes for /srv on localhost:
ls -la /srv/
total 12
drwxr-xr-x 3 root root 4096 Nov 6 08:20 .
drwxr-xr-x 30 root root 4096 Dec 19 06:41 ..
lrwxrwxrwx 1 root root 13 Nov 6 08:14 amanda -> /tank/amanda/
lrwxrwxrwx 1 root root 14 Nov 6 08:14 backups -> /tank/backups/
drwxr-xr-x 3 root root 4096 Nov 6 08:20 mnt
lrwxrwxrwx 1 root root 14 Nov 6 08:17 reports -> /tank/reports/
lrwxrwxrwx 1 root root 13 Nov 6 08:19 server -> /tank/server/
and the underlying ZFS array /srv mountpoints are symlinked to:
ls -la /tank/
total 30
drwxr-xr-x 6 root root 6 Nov 6 08:18 .
drwxr-xr-x 30 root root 4096 Dec 19 06:41 ..
drwxr-xr-x 4 amandabackup amandabackup 4 Dec 16 11:49 amanda
drwxr-xr-x 8 merkin merkin 8 Nov 6 08:56 backups
drwxr-xr-x 6 amandabackup amandabackup 6 Nov 6 08:17 reports
drwxr-xr-x 6 amandabackup amandabackup 9 Jan 7 01:45 server
Here are some examples of logfiles from days when the failures occur:
*AMREPORT: *http://kickasspastes.com/4951/
*PLANNER.DEBUG:* http://kickasspastes.com/4948/
*AMANDA PLANNER CONSOLE OUTPUT PIPED TO A TEXTFILE: *
http://kickasspastes.com/4949/
*AMANDA PLANNER CONSOLE OUTPUT SAMPLE TWO (INCLUDES "NO TRY" LINES): *
http://kickasspastes.com/4950/
A couple notes: You'll notice I've set "etimeout" to a very high value in
amanda.conf. I had to raise the value for this setting because estimates
were timing out for localhost:/srv/amanda/state and amdump was failing.
I've read about amanda, tar, and ZFS filesystems and am under the
impression they require special dumptype declarations, and if the wrong
version of tar is used timeouts can occur... *(We are currently using
"encrypted-gnutar-local" for /srv mountpoints).*.. what confounds me is
that the system ran fine for almost a month, then started timing out, so
I'm not entirely sure where the blame lies... is this a ZFS issue? What is
causing planner to calculate incorrect estimates for mountpoints on the
other host(?), nothing changed on that machine at all.
Any help resolving this issue would be greatly appreciated. I need to get
these backups moving again ASAP.
Thanks in advance,
JS
--
CONFIDENTIALITY NOTICE: This email is covered by the Electronic
Communications Privacy Act, 18 U.S.C. 2510-2521 and is legally privileged.
This communication may also contain material protected and governed by the
Health insurance Portability and Accountability Act of 1996 (HIPAA). This
e-mail is only for the personal and confidential use of the individuals to
which it is addressed and contains confidential information. If you are not
the intended recipient, you are notified that you have received this
document in error, and that any reading, distributing, copying or
disclosure is unauthorized.
If you are not the intended recipient Please notify Hatteras Printing Inc.
by calling (313) 624-3300 and destroy the message immediately.
Additionally, please do not print this email unless it is absolutely
necessary.