On 2018-07-27 12:23, Stefan G. Weichinger wrote:
Am 27.07.2018 um 17:02 schrieb Jean-Francois Malouin:

You should also consider playing with dumporder.
I have it set to 'TTTTTTTT' and that makes the longest (time wise)
dumps go first so that the fast ones get push at the end.
In one config I have:

dumporder "TTTTTTTT"
flush-threshold-dumped 100
flush-threshold-scheduled 100
taperflush 100
autoflush yes

so that all the dumps will wait until the longest one are done.
It also won't go until it can fill one volume (100%). You can
obviously go further than that if you have enough hold disk.

Or at least it's my understanding...

(the ML was down for a while, so that's the reason for my delayed response, it should work now)

I checked "dumporder" in that config, it was "BTBT...", I changed it to "TTT..." now for a test.

Although I am not 100% convinced that this will do the trick ;-)

We will see.

I never fully understood that parameter and its influence so far, to me it's a bit "unintuitive".
Perhaps I can help with that.

Part of what Amanda's scheduling does is figure out the size that each backup will be on each run (based on the estimate process), how much bandwidth it will need while dumping (based on the bandwidth settings for that particular dump type), and the amount of time it will take (predicted based on the size, prior timing data, and possibly the bandwidth). That information is then used together with the 'dumporder' setting to control how each dumper chooses what dump to do next when it finishes dumping. Each letter in the value corresponds to exactly one dumper, and controls only that dumper's selection.

The size-based selection is generally the easiest to explain, it just says to pick the largest (for 'S') or smallest (for 's') dump out of the set and run that next.

The bandwidth-based selection is only relevant if you have bandwidth settings configured. Without them, it treats all dumps as equal, and picks the next dump based solely on the order that amanda has them sorted (which, IIRC, matches the order found in the disk list). With them, it uses a similar selection method to the size-based selection, just looking at bandwidth instead of size.

The time-based selection is where things get tricky, but they get tricky because of how complicated it is to predict how long a dump will take, not because the selection is complicated (it works just like size-based selection, just looking at estimated runtime instead of size). Pretty much, the timing data is extrapolated by looking at previous dumps of the DLE, correlating size and actual run-time. I'm not sure what fitting method it uses for the extrapolation (my first guess would be simple linear extrapolation, because that's easy and should work most of the time), and I'm also not sure what, if any, impact bandwidth has on the calculation.

So, in short you have:

* 'S' and 's': Simple deterministic selection based on the predicted size of the dump. * 'B' and 'b': Simple deterministic selection based on bandwidth settings if they are defined, otherwise trivial FIFO selection. * 'T' and 't': Not quite deterministic selection based on predicted execution time of the dump process.

So, for a couple of examples:

* The default setting 'BTBTBTBT' This will have half the dumpers select dumps that will take the largest amount of time, and the other select the ones that will take the largest amount of bandwidth. This works reasonably well if you have bandwidth settings configured and wide variance in dump size.

* What you're looking at testing 'TTTTTTTT': This is a trivial case of all dumpers selecting the dumps that will take the longest time. If you're dumping almost all similar hosts, this will be essentially equivalent to just selecting the largest. If you're dumping a wide variety of different hosts, it will be equivalent to selecting the largest on the first dump, but after that will select based on which system takes the longest.

* What I use on my own systems 'SSss' (I only run four dumpers, not eight): This is a reasonably simple option that gives a good balance between getting dumps done as quickly as possible, and not wasting time waiting on the big ones. Two of the dumpers select whatever dump is the largest, so that some of the big ones get started right away, while the other two select the smallest dumps, so that those get backed up immediately. I've done some really simple testing that indicates that this actually gets all the dumps done faster on average than the default for the case of all your systems being able to dump data at the same rate.

* What we use where I work 'TTSSSSss': This is one where things get a bit complicated. There are three different ways things get selected here. First, two of the eight dumpers will select dumps that are going to take the longest amount of time. Then, you have four that will pull the largest ones, and two that will pull the smallest. This gets really good behavior where I work because we have a handful of decade old systems that we need to keep backed up which take _forever_ to back up, but most of our other systems are new and don't take too long. On the first dump, this is equivalent to 'SSSSSSss', but after that, the slow systems get priority to run while everything else is dumping even though they are not the largest or smallest dumps, so the backup process doesn't stall out waiting on them to finish at the end.

Reply via email to