Re: [[email protected]] DLEs not being backed up properly, only dumping 1-10kB each run

Debra S Baddorf Fri, 08 Jan 2016 10:39:45 -0800

> On Jan 8, 2016, at 11:56 AM, Focus1 IT <[email protected]> wrote:
> 
> Thanks for the replies, appreciate it.
> 
> Re: tar and filesystem boundaries, our disklist worked fine up until the 
> failure, I don't think that is our issue but tried to add an include 
> directive to our config and ran amdump, the estimate for that DLE was still 
> wrong.
> 
> Re: DLEs, I've posted a link to a paste of our disklist, we have DLEs for 
> each specific mount.
> 
> Re: tar version, I'm going to look into this in a bit... Because the one 
> issue that has bothered me more than the rest is the failure of all of the 
> freenas mounts.  There were no changes to that host prior to the failure, we 
> just suddenly lost the ability to back up 60+ directories overnight.
> 
> JS


The fact that it ALSO incloudes a DLE on your server (failing)  says a lot, 
too.   I’m just not sure *what*.
Deb


> 
> On Jan 8, 2016 12:37 PM, "Debra S Baddorf" <[email protected]> wrote:
> 
> > On Jan 8, 2016, at 12:26 AM, Jon LaBadie <[email protected]> wrote:
> >
> > On Thu, Jan 07, 2016 at 04:00:23PM -0500, Focus1 IT wrote:
> >> Hi,
> >>
> >> This is my first post to the amanda-users list, I'm hoping the community
> >> can help me resolve an issue that has rendered our backup-set 90%
> >> ineffective.  In short, I've got about 65 DLEs that are not being backed
> >> up, 60 of which reside on a remote host and 1 that is on the same machine
> >> as amandabackup.  Amdump runs nightly and completes backup of other DLEs
> >> properly, amreport indicates the failing DLEs are 1kB to 10kB in size
> >> regardless of the backup level assigned to them each run.
> >>
> >> An example of an amreport that includes this failure:
> >
> > IIRC as amanda uses tar, it will not cross a mount point.  And I'm
> > not sure about following symbolic links, but I don't think it will
> > backup the target of a symlink either.
> >
> > Might either of these be the problem?
> >
> > I worked around the mount point problem in the past by specifically
> > including the mount point.  For example if /usr were the filesystem
> > being backed up in the DLE and /usr/local were a separate filesystem
> > that I wanted in the same backup, that DLE had something like an
> > "include ./local" directive.
> >
> > jl
> 
> As I read his note,  I think he’s already got a separate DLE  for each 
> mounted volume.
> That ought to work.   I have had some troubles using “dump”  on ZFS disks (ie 
> it doesn’t work
> at all)  and find that I have to backup those with tar instead.    JS,  did 
> you say you ARE
> using tar for these DLEs ?
> 
> Although it might be interesting for him to check his backups  (*IF*  they’re 
> present)  and see if
> a new version of tar  was  auto-installed  at the point when things started 
> to fail.   Or check an
> older yet backup,  see what version of tar was present then, and just compare 
> it to the current
> live version.
> 
> Deb Baddorf
> 
> >
> >>
> >> DUMP SUMMARY:
> >>                                                          DUMPER STATS
> >> TAPER STATS
> >> HOSTNAME     DISK              L  ORIG-kB  OUT-kB  COMP%  MMM:SS   KB/s
> >> MMM:SS   KB/s
> >> -------------------------------- ----------------------- --------------
> >> -------------
> >> 10.6.1.209  /mnt/IT           0       10      10     --    0:00  339.2
> >> 0:02    0.0
> >> 10.6.1.209   /mnt/aeiland      0       10      10     --    0:00  307.3
> >> 0:02    0.0
> >> 10.6.1.209   /mnt/applications 0       10      10     --    0:00  544.7
> >> 0:02    0.0
> >> 10.6.1.209   /mnt/archive-epp  0       10      10     --    0:00  687.4
> >> 0:02    0.0
> >> 10.6.1.209   /mnt/aswoboda     0       10      10     --    0:00  642.8
> >> 0:02    0.0
> >> 10.6.1.209   /mnt/bmcfarlane   0       10      10     --    0:00  632.2
> >> 0:02    0.0
> >> 10.6.1.209   /mnt/bmenzie      0       10      10     --    0:00  669.3
> >> 0:02    0.0
> >> 10.6.1.209   /mnt/brodriguez   0       10      10     --    0:00  107.2
> >> 0:02    0.0
> >> 10.6.1.209   /mnt/cczinski     0       10      10     --    0:00  641.0
> >> 0:02    0.0
> >>
> >>
> >> Some history:
> >>
> >> We've used this backup-set for ~2 years, adjusting DLEs as necessary for
> >> organizational changes, and it has functioned as expected.  In early
> >> November the machine that hosts amanda-server experienced a RAID5 array
> >> failure that necessitated we rebuild the array from scratch.  The initial
> >> array used an EXT filesystem, the new array is formatted ZFS.  After
> >> recovering from this hardware failure amdump seemed to operate normally for
> >> a couple of weeks... and then things started to fall apart.
> >>
> >> Initially I noticed a notification in the nightly amreport that our holding
> >> disk was missing as we had failed to recreate the proper directory
> >> structure after rebuilding the failed array.  After creating the missing
> >> directory that error was no longer present in the logs.  Shortly thereafter
> >> the bulk of our DLEs started being dumped incorrectly and I've been unable
> >> to determine why.  I'm hoping the list will be able to provide me with some
> >> insight as to what triggered the error and assist me in restoring our
> >> backup capabilities.
> >>
> >> Here are our config files:
> >>
> >> *AMANDA.CONF: *http://kickasspastes.com/4946/
> >>
> >> *DISKLIST: *http://kickasspastes.com/4947/
> >>
> >> All of the DLEs on host 10.6.1.209 (a FreeNAS box) are failing.
> >> One of four DLEs on localhost is failing (/srv/backups).
> >>
> >> Here is a view of file attributes for /srv on localhost:
> >>
> >> ls -la /srv/
> >> total 12
> >> drwxr-xr-x  3 root root 4096 Nov  6 08:20 .
> >> drwxr-xr-x 30 root root 4096 Dec 19 06:41 ..
> >> lrwxrwxrwx  1 root root   13 Nov  6 08:14 amanda -> /tank/amanda/
> >> lrwxrwxrwx  1 root root   14 Nov  6 08:14 backups -> /tank/backups/
> >> drwxr-xr-x  3 root root 4096 Nov  6 08:20 mnt
> >> lrwxrwxrwx  1 root root   14 Nov  6 08:17 reports -> /tank/reports/
> >> lrwxrwxrwx  1 root root   13 Nov  6 08:19 server -> /tank/server/
> >>
> >> and the underlying ZFS array /srv mountpoints are symlinked to:
> >>
> >> ls -la /tank/
> >> total 30
> >> drwxr-xr-x  6 root         root            6 Nov  6 08:18 .
> >> drwxr-xr-x 30 root         root         4096 Dec 19 06:41 ..
> >> drwxr-xr-x  4 amandabackup amandabackup    4 Dec 16 11:49 amanda
> >> drwxr-xr-x  8 merkin       merkin          8 Nov  6 08:56 backups
> >> drwxr-xr-x  6 amandabackup amandabackup    6 Nov  6 08:17 reports
> >> drwxr-xr-x  6 amandabackup amandabackup    9 Jan  7 01:45 server
> >>
> >>
> >>
> >> Here are some examples of logfiles from days when the failures occur:
> >>
> >> *AMREPORT: *http://kickasspastes.com/4951/
> >>
> >> *PLANNER.DEBUG:* http://kickasspastes.com/4948/
> >>
> >> *AMANDA PLANNER CONSOLE OUTPUT PIPED TO A TEXTFILE: *
> >> http://kickasspastes.com/4949/
> >>
> >> *AMANDA PLANNER CONSOLE OUTPUT SAMPLE TWO (INCLUDES "NO TRY" LINES): *
> >> http://kickasspastes.com/4950/
> >>
> >>
> >> A couple notes:  You'll notice I've set "etimeout" to a very high value in
> >> amanda.conf.  I had to raise the value for this setting because estimates
> >> were timing out for localhost:/srv/amanda/state and amdump was failing.
> >> I've read about amanda, tar, and ZFS filesystems and am under the
> >> impression they require special dumptype declarations, and if the wrong
> >> version of tar is used timeouts can occur... *(We are currently using
> >> "encrypted-gnutar-local" for /srv mountpoints).*.. what confounds me is
> >> that the system ran fine for almost a month, then started timing out, so
> >> I'm not entirely sure where the blame lies... is this a ZFS issue?  What is
> >> causing planner to calculate incorrect estimates for mountpoints on the
> >> other host(?), nothing changed on that machine at all.
> >>
> >> Any help resolving this issue would be greatly appreciated.  I need to get
> >> these backups moving again ASAP.
> >>
> >> Thanks in advance,
> >>
> >> JS
> >>
> >> --
> >>
> >>
> >> CONFIDENTIALITY NOTICE: This email is covered by the Electronic
> >> Communications Privacy Act, 18 U.S.C. 2510-2521 and is legally privileged.
> >> This communication may also contain material protected and governed by the
> >> Health insurance Portability and Accountability Act of 1996 (HIPAA). This
> >> e-mail is only for the personal and confidential use of the individuals to
> >> which it is addressed and contains confidential information. If you are not
> >> the intended recipient, you are notified that you have received this
> >> document in error, and that any reading, distributing, copying or
> >> disclosure is unauthorized.
> >>
> >> If you are not the intended recipient Please notify Hatteras Printing Inc.
> >> by calling (313) 624-3300 and destroy the message immediately.
> >> Additionally, please do not print this email unless it is absolutely
> >> necessary.
> >>>> End of included message <<<
> >
> > --
> > Jon H. LaBadie                 [email protected]
> > 11226 South Shore Rd.          (703) 787-0688 (H)
> > Reston, VA  20190              (703) 935-6720 (C)
> 
> --
> You received this message because you are subscribed to the Google Groups 
> "Focus1 IT" group.
> To view this discussion on the web visit 
> https://groups.google.com/a/focus1data.com/d/msgid/Focus1IT/126BCFEB-AE2F-4648-BFBD-90E3C90083BB%40fnal.gov.
> 
> 
> 
> 
> CONFIDENTIALITY NOTICE: This email is covered by the Electronic 
> Communications Privacy Act, 18 U.S.C. 2510-2521 and is legally privileged. 
> This communication may also contain material protected and governed by the 
> Health insurance Portability and Accountability Act of 1996 (HIPAA). This 
> e-mail is only for the personal and confidential use of the individuals to 
> which it is addressed and contains confidential information. If you are not 
> the intended recipient, you are notified that you have received this document 
> in error, and that any reading, distributing, copying or disclosure is 
> unauthorized.
> 
> If you are not the intended recipient Please notify Hatteras Printing Inc. by 
> calling (313) 624-3300 and destroy the message immediately. Additionally, 
> please do not print this email unless it is absolutely necessary. 
>

Re: [[email protected]] DLEs not being backed up properly, only dumping 1-10kB each run

Reply via email to