Re: Flushing the Holding Disk

Debra S Baddorf Fri, 16 Nov 2018 11:55:04 -0800

> On Nov 16, 2018, at 1:37 PM, Gene Heskett <[email protected]> wrote:
> 
> On Friday 16 November 2018 13:59:59 Debra S Baddorf wrote:
> 
>>> On Nov 16, 2018, at 12:11 PM, Austin S. Hemmelgarn
>>> <[email protected]> wrote:
>>> 
>>> On 2018-11-16 12:27, Chris Miller wrote:
>>>> Hi Folks,
>>>> I'm unclear on the timing of the flush from holding disk to vtape.
>>>> Suppose I run two backup jobs,and each uses the holding disk. When
>>>> will the second job start? Obviously, after the client has sent
>>>> everything... Before the holding disk flush starts, or after the
>>>> holding disk flush has completed?
>>> 
>>> If by 'jobs' you mean 'amanda configurations', the second one starts
>>> when you start it.  Note that `amdump` does not return until
>>> everything is finished dumping and optionally taping if anything
>>> would be taped, so you can literally just run each one sequentially
>>> in a shell script and they won't run in parallel.
>>> 
>>> If by 'jobs' you mean DLE's, they run as concurrently as you tell
>>> Amanda to run them.  If you've got things serialized (`inparallel`
>>> is set to 1 in your config), then the next DLE will start dumping
>>> once the previous one is finished dumping to the holding disk. 
>>> Otherwise, however many you've said can run in parallel run (within
>>> per-host limits), and DLE's start when the previous one in sequence
>>> for that dumper finishes. Taping can (by default) run in parallel
>>> with dumping if you're using a holding disk, which is generally a
>>> good thing, though you can also easily configure it to wait for some
>>> amount of data to be buffered on the holding disk before it starts
>>> taping.
>>> 
>>>> Is there any way to defer the holding disk flush until all backup
>>>> jobs for a given night have completed?
>>> 
>>> Generically, set `autoflush no` in each configuration, and then run
>>> `amflush` for each configuration once all the dumps are done.
>>> 
>>> However, unless you've got an odd arrangement where every system
>>> saturates the network link while actually dumping and you are
>>> sharing a single link on the Amanda server for both dumping and
>>> taping, this actually probably won't do anything for your
>>> performance.  You can easily configure amanda to flush backups from
>>> each DLE as soon as they are done, and it will wait to exit until
>>> everything is actually flushed.
>>> 
>>> Building from that, if you just want to ensure the `amdump`
>>> instances don't run in parallel, just use a tool to fire them off
>>> sequentially in the foreground.  Stuff like Ansible is great for
>>> this (especially because you can easily conditionally back up your
>>> index and tapelist when the dump finishes).  As long as the next
>>> `amdump` command isn't started until the previous one returns, you
>>> won't have to worry about them fighting each other for bandwidth.
>> 
>> Chris:  you have some control over when DLEs go from the holding disk
>> to the actual tape (or vtape). This paragraph is from the examples, 
>> and I keep it in my config files so I remember how to setup these
>> params: #  New amanda includes these explanatory paragraphs:
>> 
>> # flush-threshold-dumped, flush-threshold-scheduled, taperflush, and
>> autoflush # are used to control tape utilization. See the amanda.conf
>> (5) manpage for # details on how they work. Taping will not start
>> until all criteria are # satisfied. Here are some examples:
>> #
>> # You want to fill tapes completely even in the case of failed dumps,
>> and # don't care if some dumps are left on the holding disk after a
>> run: # flush-threshold-dumped        100 # (or more)
>> # flush-threshold-scheduled     100 # (or more)
>> # taperflush                    100
>> # autoflush                     yes
>> #
>> # You want to improve tape performance by waiting for a complete tape
>> of data # before writing anything. However, all dumps will be flushed;
>> none will # be left on the holding disk.
>> # flush-threshold-dumped        100 # (or more)
>> # flush-threshold-scheduled     100 # (or more)
>> # taperflush    0
>> #
>> # You don't want to use a new tape for every run, but want to start
>> writing # to tape as soon as possible:
>> # flush-threshold-dumped        0   # (or more)
>> # flush-threshold-scheduled     100 # (or more)
>> # taperflush    100
>> # autoflush     yes
>> # maxdumpsize   100k # amount of data to dump each run; see above.
>> #
>> # You want to keep the most recent dumps on holding disk, for faster
>> recovery. # Older dumps will be rotated to tape during each run.
>> # flush-threshold-dumped        300 # (or more)
>> # flush-threshold-scheduled     300 # (or more)
>> # taperflush    300
>> # autoflush     yes
>> #
>> # Defaults:
>> # (no restrictions; flush to tape immediately; don't flush old dumps.)
>> #flush-threshold-dumped 0
>> #flush-threshold-scheduled 0
>> #taperflush 0
>> #autoflush no
>> #
>> —————
>> Here is part of my setup, with further comments beside each param.  I
>> may have written some of these comments, so I hope they are completely
>> correct.  I think they are.
>> ———————
>> ## with LTO5 tapes,  as of 2/27/2015,  I still only USE one tape.
>> ## Don't faff around;  just write to the silly tape.   But to avoid
>> ## shoe shining,  let some amount accumulate.  Else we'd be writing
>> ## the first tiny file and then waiting .....
>> 
>> ##  see <othernode>  if you  need LTO5 settings.
>> ## Enzo is using an LTO4,  so revert to these:
>> 
>> # You want to improve tape performance by waiting for a complete tape
>> of data # before writing anything. However, all dumps will be flushed;
>> none will # be left on the holding disk.
>> # flush-threshold-dumped        100 # (or more)
>> # flush-threshold-scheduled     100 # (or more)
>> # taperflush    0
>> 
>> flush-threshold-dumped 100      #Default: 0.
>>                        # Amanda will not begin writing data to a new
>> tape volume # until the amount of data on the holding disk is at least
>> this percentage # of the volume size.     The idea is to accumulate a
>> bunch of files, # so the fill algorithm "Greedy Algorithm"  has some
>> choices to work with. #  The value of this parameter may not exceed
>> than that of the # flush-threshold-scheduled parameter.
>> 
>> flush-threshold-scheduled 100          #Default: 0.
>>                        # Amanda will not begin writing data to a new
>> volume until the sum of # the amount of data on the holding disk and
>> the estimated amount of data # remaining to be dumped during this run
>> is at least this percentage of # the volume size.
>>                        #  The value of this parameter may not be less
>> than that of the # flush-threshold-dumped or taperflush parameters.
>> 
>> 
>> taperflush 0            # Default: 0.
>>                        # At the end of a run, Amanda will start a new
>> tape to flush remaining data # if there is more data on the holding
>> disk at the end of a run than this # setting allows; the amount is
>> specified as a percentage of the capacity # of a single volume.
>>                        ####  dsbdsb   ie.  0 == start a new tape if
>> any data is still on holding disk. ####           Good.
>> ## taperflush              <= flush-threshold-scheduled
>> ## flush-threshold-dumped  <= flush-threshold-scheduled
>> 
>> #autoflush yes          #  only flushes those NAMED on the command
>> line.  Use ALL.  6/28/13 autoflush all          # flush leftovers from
>> a crash, or a ran-out-of-tape condition
>> 
>> NOTE THE PART THAT SURPRISED ME, A FEW VERSIONS BACK;
>> autoflush   has values of  no / yes / all
>> “yes” and “all” behave slightly differently.
>> 
>> Deb Baddorf
>> Fermilab
> 
> Thank you a bunch Deb, that is better explained than the manpages ever 
> do.
> 
> 
> Copyright 2018 by Maurice E. Heskett
> -- 
> Cheers, Gene Heskett

:D   Glad it wasn’t “too much info”.    I think my concepts have been gathered 
from
past dialogs on this mailing list.

Note that the 100 settings require you to have holding-disk space at least 
equal to
one tape/vtape  space.

Deb
Re: Flushing the Holding Disk

Reply via email to