On 2018-03-22 19:03, Ryan, Lyle (US) wrote:
I've got an Amanda 3.4.5 server running on Centos 7 now, and am able to do rudimentary backups of a remote client.

But in spite of reading man pages, HowTo's, etc, I need help choosing config params.  I don't mind continuing to read and experiment, but if someone could get me at least in the ballpark, I'd really appreciate it.

The server has an 11TB filesystem to store the backups in.  I should probably be fancier and split this up more, but not now.   So I've got my holding, state, and vtapes directories all in there.

The main client I want to back up has 4TB I want to backup.  It's almost all in one filesystem, but the HowTo for splitting DLE's with exclude lists is clear, so it should be easy to split this into (say) 10 smaller individual dumps.  The bulk of the data is pretty static, maybe 10%/month changes.  It's hard to imagine 20%/month changing.

For a start, I'd like to get a full done every 2 weeks, and incrementals/differentials on the intervening days.   If I have room to keep 2 fulls (2 complete dumpcycles) that would be great.
Given what you've said, you should have enough room to do so, but only if you use compression. Assuming the rate of change you quote above s approximately constant and doesn't result in bumping to a level higher than 1, then without compression you will need roughly 4.015TB per cycle (4TB for the full backup, ~15.38GB for the incrementals (roughly 0.38% change per day for 13 days)), plus 4TB of space for the holding disk (because you have to have room for a full backup _there_ prior to taping anything). With compression and assuming you get a compression ratio of about 50%, you should actually be able to fit four complete cycles (you would need about 2.0075TB per cycle), though if you decide you want that I would bump the tapecycle to 60 and the number of slots to 60.

So I'm thinking:

- dumpcycle = 14

- runspercycle = 0 (default)

- tapecycle = 30

- runtapes = 1 (default)

I'd break the filesystem into 10 pieces, so 400GB each. and make the vtapes 400GB each (with tapetype length) relying on server-side compression to make it fit.

The HowTo "Use pigz to speed compression" looks clear, and the DL380 G7 isn't doing anything else, so server-side compression sounds good.

Any advice on this or better ideas?  Maybe I'm off in left-field.

And one bonus question:  I'm assuming Amanda will just make vtapes as necessary, but is there any guidance as to how many vtape slots I should create ahead of time?  If my dumpcycle=14, maybe create 14 slots just to make tapes easier to find?

Debra covered the requirements for vtapes, slots, and everything very well in her reply, so I won't repeat any of that here. I do however have some other more generic advice I can give based on my own experience:

* Make your vtapes as large as possible. They won't take up any space beyond what's stored on them (in storage terminology, they're thinly provisioned), so their total 'virtual' size can be far more than your actual storage capacity, but if you can make it so that you can always fit a full backup on a single vtape, it will make figuring out how many vtapes you need easier, and additionally give a slight boost to taping performance (because the taper never has to stop to switch to a new vtape). In your case, I'd say stating 5TB for your vtape size is reasonable, that would give you some extra room if you suddenly have more data without being insanely over-sized.

* Make sure to set a reasonable part_size for your vtapes. While you wouldn't have to worry about splitting dumps if you take my above advice about vtape size, using parts has some other performance related advantages. I normally use 1G, but all of my dumps are less than 100G in size. In your case, if you'll have 10 400G dumps, I'd probably go for 4G for the part size.

* Match your holding disk chunk size to your vtape's part_size. I have no hard number to back this up, but it appears to provide a slight performance improvement while dumping data.

* Don't worry right now about parallelizing the taping process. It's somewhat complicated to get it working right, significantly changes how you have to calculate vtape slots and sizes, and will probably not provide much benefit unless you're taping to a really fast RAID array that does a very good job of handling parallel writes.

* There's essentially zero performance benefit to having your holding disk on a separate partition from your final storage unless you have it on a completely separate disk. There are some benefits in terms of reliability, but realizing them requires some significant planning (you have to figure out exactly what amount of space your holding disk will need).

* If you're indexing the backups, store the working index directory (the one Amanda actually reads and writes to) on a separate drive from the holding disk and final backup storage, but make sure it doesn't get included in the backup if you're backing up your local system as part of this configuration. This is the single biggest performance booster I've found so far when dealing with Amanda. You can still copy the index over to the final backup storage location (and I would actually encourage you to do so), but just make sure it's not being written to or read from off of that location while backups are being taped.

* Given the fact that you're going to need to use compression, I would suggest looking into how much processing power you can throw at that by doing some actual testing. In particular, I would suggest trying test dumps a couple of times with different compression types to see how fast each type runs and how much space it saves you. Keep in mind that you can pass extra options to any compression program you want by using the custom compression support and a wrapper script like this:

    #!/bin/bash
    /path/to/program --options $@

If you can get it on your distribution, I'd suggest looking into zstandard [1] for compression. The default settings for it compress both better _and_ faster than the default gzip settings.

* Given that you're only backing up to a local disk, try tweaking the device_output_buffer_size and see how that impacts your performance. 1M seems to be a good starting point for local disks, but higher values may get you much better performance.

Reply via email to