Re: Flushing the Holding Disk

Debra S Baddorf Fri, 16 Nov 2018 11:01:48 -0800

> On Nov 16, 2018, at 12:11 PM, Austin S. Hemmelgarn <[email protected]> 
> wrote:
> 
> On 2018-11-16 12:27, Chris Miller wrote:
>> Hi Folks,
>> I'm unclear on the timing of the flush from holding disk to vtape. Suppose I 
>> run two backup jobs,and each uses the holding disk. When will the second job 
>> start? Obviously, after the client has sent everything... Before the holding 
>> disk flush starts, or after the holding disk flush has completed?
> If by 'jobs' you mean 'amanda configurations', the second one starts when you 
> start it.  Note that `amdump` does not return until everything is finished 
> dumping and optionally taping if anything would be taped, so you can 
> literally just run each one sequentially in a shell script and they won't run 
> in parallel.
> 
> If by 'jobs' you mean DLE's, they run as concurrently as you tell Amanda to 
> run them.  If you've got things serialized (`inparallel` is set to 1 in your 
> config), then the next DLE will start dumping once the previous one is 
> finished dumping to the holding disk.  Otherwise, however many you've said 
> can run in parallel run (within per-host limits), and DLE's start when the 
> previous one in sequence for that dumper finishes. Taping can (by default) 
> run in parallel with dumping if you're using a holding disk, which is 
> generally a good thing, though you can also easily configure it to wait for 
> some amount of data to be buffered on the holding disk before it starts 
> taping.
>> Is there any way to defer the holding disk flush until all backup jobs for a 
>> given night have completed?
> Generically, set `autoflush no` in each configuration, and then run `amflush` 
> for each configuration once all the dumps are done.
> 
> However, unless you've got an odd arrangement where every system saturates 
> the network link while actually dumping and you are sharing a single link on 
> the Amanda server for both dumping and taping, this actually probably won't 
> do anything for your performance.  You can easily configure amanda to flush 
> backups from each DLE as soon as they are done, and it will wait to exit 
> until everything is actually flushed.
> 
> Building from that, if you just want to ensure the `amdump` instances don't 
> run in parallel, just use a tool to fire them off sequentially in the 
> foreground.  Stuff like Ansible is great for this (especially because you can 
> easily conditionally back up your index and tapelist when the dump finishes). 
>  As long as the next `amdump` command isn't started until the previous one 
> returns, you won't have to worry about them fighting each other for bandwidth.

Chris:  you have some control over when DLEs go from the holding disk to the 
actual tape (or vtape).
This paragraph is from the examples,  and I keep it in my config files so I 
remember how to setup these params:
#  New amanda includes these explanatory paragraphs:

# flush-threshold-dumped, flush-threshold-scheduled, taperflush, and autoflush
# are used to control tape utilization. See the amanda.conf (5) manpage for
# details on how they work. Taping will not start until all criteria are
# satisfied. Here are some examples:
#
# You want to fill tapes completely even in the case of failed dumps, and
# don't care if some dumps are left on the holding disk after a run:
# flush-threshold-dumped        100 # (or more)
# flush-threshold-scheduled     100 # (or more)
# taperflush                    100
# autoflush                     yes
#
# You want to improve tape performance by waiting for a complete tape of data
# before writing anything. However, all dumps will be flushed; none will
# be left on the holding disk.
# flush-threshold-dumped        100 # (or more)
# flush-threshold-scheduled     100 # (or more)
# taperflush    0
#
# You don't want to use a new tape for every run, but want to start writing
# to tape as soon as possible:
# flush-threshold-dumped        0   # (or more)
# flush-threshold-scheduled     100 # (or more)
# taperflush    100
# autoflush     yes
# maxdumpsize   100k # amount of data to dump each run; see above.
#
# You want to keep the most recent dumps on holding disk, for faster recovery.
# Older dumps will be rotated to tape during each run.
# flush-threshold-dumped        300 # (or more)
# flush-threshold-scheduled     300 # (or more)
# taperflush    300
# autoflush     yes
#
# Defaults:
# (no restrictions; flush to tape immediately; don't flush old dumps.)
#flush-threshold-dumped 0
#flush-threshold-scheduled 0
#taperflush 0
#autoflush no
#
 —————
Here is part of my setup, with further comments beside each param.  I may have 
written some of these comments,
so I hope they are completely correct.  I think they are.
———————
## with LTO5 tapes,  as of 2/27/2015,  I still only USE one tape.
## Don't faff around;  just write to the silly tape.   But to avoid
## shoe shining,  let some amount accumulate.  Else we'd be writing
## the first tiny file and then waiting .....

##  see <othernode>  if you  need LTO5 settings.
## Enzo is using an LTO4,  so revert to these:

# You want to improve tape performance by waiting for a complete tape of data
# before writing anything. However, all dumps will be flushed; none will
# be left on the holding disk.
# flush-threshold-dumped        100 # (or more)
# flush-threshold-scheduled     100 # (or more)
# taperflush    0

flush-threshold-dumped 100      #Default: 0.
                        # Amanda will not begin writing data to a new tape 
volume
                        # until the amount of data on the holding disk is at 
least this percentage
                        # of the volume size.     The idea is to accumulate a 
bunch of files,
                        # so the fill algorithm "Greedy Algorithm"  has some 
choices to work with.
                        #  The value of this parameter may not exceed than that 
of the
                        # flush-threshold-scheduled parameter.

flush-threshold-scheduled 100          #Default: 0.
                        # Amanda will not begin writing data to a new volume 
until the sum of
                        # the amount of data on the holding disk and the 
estimated amount of data
                        # remaining to be dumped during this run is at least 
this percentage of
                        # the volume size.
                        #  The value of this parameter may not be less than 
that of the
                        # flush-threshold-dumped or taperflush parameters.


taperflush 0            # Default: 0.
                        # At the end of a run, Amanda will start a new tape to 
flush remaining data
                        # if there is more data on the holding disk at the end 
of a run than this
                        # setting allows; the amount is specified as a 
percentage of the capacity
                        # of a single volume.
                        ####  dsbdsb   ie.  0 == start a new tape if any data 
is still on holding disk.
                        ####           Good.
## taperflush              <= flush-threshold-scheduled
## flush-threshold-dumped  <= flush-threshold-scheduled

#autoflush yes          #  only flushes those NAMED on the command line.  Use 
ALL.  6/28/13
autoflush all          # flush leftovers from a crash, or a ran-out-of-tape 
condition

NOTE THE PART THAT SURPRISED ME, A FEW VERSIONS BACK;
autoflush   has values of  no / yes / all
“yes” and “all” behave slightly differently.

Deb Baddorf 
Fermilab
Re: Flushing the Holding Disk

Reply via email to