On Mon, Jun 17, 2019 at 18:13:30 -0700, Jim Kusznir wrote:
> The backups are not completing because the level 0 problem: planner thinks
> its running a backup that can complete, but ends up with 10x the data that
> a single run can handle, and it fills up the tape and the holding disk
> (which is a little more than a full tape), and then the run suspends until
> I kill it off with amcleankup -k. So that explains the new files.
[...]
> I guess its possible the minor number is changing, but that too seems
> unlikely.
>
> At this point, I also am not sure what is the best way to recover. I have
> a full holding disk (mostly of incomplete backups), and the .new gtar
> lists, etc. It seems like a waste of a tape (and 26 hours to fill said
> tape) to put a bunch of partial backups....
Okay, I'm thinking a way forward from here is to try running a dump run
with just a few of your DLEs, so that they actually fit on a tape and
those dumps run to completion. You can either do this by commenting out
all but a few DLEs in your disklist file, or by passing a few host/disk
expressions on the amdump command line.
I'd probably start with just one or two DLEs, and be sure to do a
"amadmin ... force" on those DLEs beforehand so they run as level 0
dumps.
I don't believe the .new files left out there in the snapshot directory
will cause a problem for you -- I"m pretty sure Amanda will just
overwrite them the next time it tries to use the level in question.
You probably do need to clear room in your holding disk so that future
dump runs have space to work. Can you run amflush to clear them
complete ones to tape? After that, it's probably safe to just delete
the incomplete ones that remain (on the theory that you will getting
complete backups of the DLEs in question before too long...)
Once you get a DLE or two to complete successfully, then you can try
running those same DLEs again, and see if those incremental are the
expected size....
(Obviously with this approach the next two tapes you use will be mostly
empty, but the advantage is that the tests will run pretty quickly, and
hopefully they will go a long way toward narrowing down the source of
the problem....)
Meanwhile, just to start investigating the possibility that it is a
device number problem: does the "stat" command on the FreeNAS box show
you device-number info? On my Linux box the stat output looks like:
$ stat .profile
File: .profile
Size: 675 Blocks: 8 IO Block: 4096 regular file
Device: 901h/2305d Inode: 83583 Links: 1
Access: (0640/-rw-r-----) Uid: ( 1001/nathanst) Gid: ( 1001/nathanst)
Access: 2019-06-17 10:22:00.630973405 -0400
Modify: 2016-05-20 17:58:08.467463415 -0400
Change: 2017-06-07 10:19:38.467463415 -0400
Birth: -
If you run "stat" on some file that's included in one of your DLEs, and the
output has the "Device: 901h/2305d" part (or something similar), then we
should be able to use that to check to see if the device number is
causing the problem.
Along those lines, does the FreNAS shell environment have Perl
installed? Or can you easily copy a snapshot file over to a machine
that has Perl?
Nathan
----------------------------------------------------------------------------
Nathan Stratton Treadway - [email protected] - Mid-Atlantic region
Ray Ontko & Co. - Software consulting services - http://www.ontko.com/
GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239
Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239