Re: Flushing the Holding Disk

Gene Heskett Fri, 16 Nov 2018 11:38:16 -0800

On Friday 16 November 2018 13:59:59 Debra S Baddorf wrote:

> > On Nov 16, 2018, at 12:11 PM, Austin S. Hemmelgarn
> > <[email protected]> wrote:
> >
> > On 2018-11-16 12:27, Chris Miller wrote:
> >> Hi Folks,
> >> I'm unclear on the timing of the flush from holding disk to vtape.
> >> Suppose I run two backup jobs,and each uses the holding disk. When
> >> will the second job start? Obviously, after the client has sent
> >> everything... Before the holding disk flush starts, or after the
> >> holding disk flush has completed?
> >
> > If by 'jobs' you mean 'amanda configurations', the second one starts
> > when you start it.  Note that `amdump` does not return until
> > everything is finished dumping and optionally taping if anything
> > would be taped, so you can literally just run each one sequentially
> > in a shell script and they won't run in parallel.
> >
> > If by 'jobs' you mean DLE's, they run as concurrently as you tell
> > Amanda to run them.  If you've got things serialized (`inparallel`
> > is set to 1 in your config), then the next DLE will start dumping
> > once the previous one is finished dumping to the holding disk. 
> > Otherwise, however many you've said can run in parallel run (within
> > per-host limits), and DLE's start when the previous one in sequence
> > for that dumper finishes. Taping can (by default) run in parallel
> > with dumping if you're using a holding disk, which is generally a
> > good thing, though you can also easily configure it to wait for some
> > amount of data to be buffered on the holding disk before it starts
> > taping.
> >
> >> Is there any way to defer the holding disk flush until all backup
> >> jobs for a given night have completed?
> >
> > Generically, set `autoflush no` in each configuration, and then run
> > `amflush` for each configuration once all the dumps are done.
> >
> > However, unless you've got an odd arrangement where every system
> > saturates the network link while actually dumping and you are
> > sharing a single link on the Amanda server for both dumping and
> > taping, this actually probably won't do anything for your
> > performance.  You can easily configure amanda to flush backups from
> > each DLE as soon as they are done, and it will wait to exit until
> > everything is actually flushed.
> >
> > Building from that, if you just want to ensure the `amdump`
> > instances don't run in parallel, just use a tool to fire them off
> > sequentially in the foreground.  Stuff like Ansible is great for
> > this (especially because you can easily conditionally back up your
> > index and tapelist when the dump finishes).  As long as the next
> > `amdump` command isn't started until the previous one returns, you
> > won't have to worry about them fighting each other for bandwidth.
>
> Chris:  you have some control over when DLEs go from the holding disk
> to the actual tape (or vtape). This paragraph is from the examples, 
> and I keep it in my config files so I remember how to setup these
> params: #  New amanda includes these explanatory paragraphs:
>
> # flush-threshold-dumped, flush-threshold-scheduled, taperflush, and
> autoflush # are used to control tape utilization. See the amanda.conf
> (5) manpage for # details on how they work. Taping will not start
> until all criteria are # satisfied. Here are some examples:
> #
> # You want to fill tapes completely even in the case of failed dumps,
> and # don't care if some dumps are left on the holding disk after a
> run: # flush-threshold-dumped        100 # (or more)
> # flush-threshold-scheduled     100 # (or more)
> # taperflush                    100
> # autoflush                     yes
> #
> # You want to improve tape performance by waiting for a complete tape
> of data # before writing anything. However, all dumps will be flushed;
> none will # be left on the holding disk.
> # flush-threshold-dumped        100 # (or more)
> # flush-threshold-scheduled     100 # (or more)
> # taperflush    0
> #
> # You don't want to use a new tape for every run, but want to start
> writing # to tape as soon as possible:
> # flush-threshold-dumped        0   # (or more)
> # flush-threshold-scheduled     100 # (or more)
> # taperflush    100
> # autoflush     yes
> # maxdumpsize   100k # amount of data to dump each run; see above.
> #
> # You want to keep the most recent dumps on holding disk, for faster
> recovery. # Older dumps will be rotated to tape during each run.
> # flush-threshold-dumped        300 # (or more)
> # flush-threshold-scheduled     300 # (or more)
> # taperflush    300
> # autoflush     yes
> #
> # Defaults:
> # (no restrictions; flush to tape immediately; don't flush old dumps.)
> #flush-threshold-dumped 0
> #flush-threshold-scheduled 0
> #taperflush 0
> #autoflush no
> #
>  —————
> Here is part of my setup, with further comments beside each param.  I
> may have written some of these comments, so I hope they are completely
> correct.  I think they are.
> ———————
> ## with LTO5 tapes,  as of 2/27/2015,  I still only USE one tape.
> ## Don't faff around;  just write to the silly tape.   But to avoid
> ## shoe shining,  let some amount accumulate.  Else we'd be writing
> ## the first tiny file and then waiting .....
>
> ##  see <othernode>  if you  need LTO5 settings.
> ## Enzo is using an LTO4,  so revert to these:
>
> # You want to improve tape performance by waiting for a complete tape
> of data # before writing anything. However, all dumps will be flushed;
> none will # be left on the holding disk.
> # flush-threshold-dumped        100 # (or more)
> # flush-threshold-scheduled     100 # (or more)
> # taperflush    0
>
> flush-threshold-dumped 100      #Default: 0.
>                         # Amanda will not begin writing data to a new
> tape volume # until the amount of data on the holding disk is at least
> this percentage # of the volume size.     The idea is to accumulate a
> bunch of files, # so the fill algorithm "Greedy Algorithm"  has some
> choices to work with. #  The value of this parameter may not exceed
> than that of the # flush-threshold-scheduled parameter.
>
> flush-threshold-scheduled 100          #Default: 0.
>                         # Amanda will not begin writing data to a new
> volume until the sum of # the amount of data on the holding disk and
> the estimated amount of data # remaining to be dumped during this run
> is at least this percentage of # the volume size.
>                         #  The value of this parameter may not be less
> than that of the # flush-threshold-dumped or taperflush parameters.
>
>
> taperflush 0            # Default: 0.
>                         # At the end of a run, Amanda will start a new
> tape to flush remaining data # if there is more data on the holding
> disk at the end of a run than this # setting allows; the amount is
> specified as a percentage of the capacity # of a single volume.
>                         ####  dsbdsb   ie.  0 == start a new tape if
> any data is still on holding disk. ####           Good.
> ## taperflush              <= flush-threshold-scheduled
> ## flush-threshold-dumped  <= flush-threshold-scheduled
>
> #autoflush yes          #  only flushes those NAMED on the command
> line.  Use ALL.  6/28/13 autoflush all          # flush leftovers from
> a crash, or a ran-out-of-tape condition
>
> NOTE THE PART THAT SURPRISED ME, A FEW VERSIONS BACK;
> autoflush   has values of  no / yes / all
> “yes” and “all” behave slightly differently.
>
> Deb Baddorf
> Fermilab


Thank you a bunch Deb, that is better explained than the manpages ever 
do.


Copyright 2018 by Maurice E. Heskett
-- 
Cheers, Gene Heskett
--
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Genes Web page <http://geneslinuxbox.net:6309/gene>

Re: Flushing the Holding Disk

Reply via email to