On Mon, Jun 17, 2019 at 18:13:30 -0700, Jim Kusznir wrote:
> The backups are not completing because the level 0 problem: planner thinks
> its running a backup that can complete, but ends up with 10x the data that
> a single run can handle, and it fills up the tape and the holding disk
> (which is a little more than a full tape), and then the run suspends until
> I kill it off with amcleankup -k.  So that explains the new files.

[...] 
> I guess its possible the minor number is changing, but that too seems
> unlikely.
> 
> At this point, I also am not sure what is the best way to recover.  I have
> a full holding disk (mostly of incomplete backups), and the .new gtar
> lists, etc.  It seems like a waste of a tape (and 26 hours to fill said
> tape) to put a bunch of partial backups....

Okay, I'm thinking a way forward from here is to try running a dump run
with just a few of your DLEs, so that they actually fit on a tape and
those dumps run to completion.  You can either do this by commenting out
all but a few DLEs in your disklist file, or by passing a few host/disk
expressions on the amdump command line.  

I'd probably start with just one or two DLEs, and be sure to do a
"amadmin ... force" on those DLEs beforehand so they run as level 0
dumps.

I don't believe the .new files left out there in the snapshot directory
will cause a problem for you -- I"m pretty sure Amanda will just
overwrite them the next time it tries to use the level in question.

You probably do need to clear room in your holding disk so that future
dump runs have space to work.  Can you run amflush to clear them
complete ones to tape?  After that, it's probably safe to just delete
the incomplete ones that remain (on the theory that you will getting
complete backups of the DLEs in question before too long...)


Once you get a DLE or two to complete successfully, then you can try
running those same DLEs again, and see if those incremental are the
expected size....  

(Obviously with this approach the next two tapes you use will be mostly
empty, but the advantage is that the tests will run pretty quickly, and
hopefully they will go a long way toward narrowing down the source of
the problem....)


Meanwhile, just to start investigating the possibility that it is a
device number problem: does the "stat" command on the FreeNAS box show
you device-number info?  On my Linux box the stat output looks like:
  $ stat .profile
   File: .profile
    Size: 675             Blocks: 8          IO Block: 4096   regular file
  Device: 901h/2305d      Inode: 83583       Links: 1
  Access: (0640/-rw-r-----)  Uid: ( 1001/nathanst)   Gid: ( 1001/nathanst)
  Access: 2019-06-17 10:22:00.630973405 -0400
  Modify: 2016-05-20 17:58:08.467463415 -0400
  Change: 2017-06-07 10:19:38.467463415 -0400
   Birth: -

If you run "stat" on some file that's included in one of your DLEs, and the
output has the "Device: 901h/2305d" part (or something similar), then we
should be able to use that to check to see if the device number is
causing the problem.

Along those lines, does the FreNAS shell environment have Perl
installed?  Or can you easily copy a snapshot file over to a machine
that has Perl?

                                                        Nathan

----------------------------------------------------------------------------
Nathan Stratton Treadway  -  [email protected]  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Reply via email to