Speccing a large amanda install (was: Re: How's amanda feeling these days?)

Dave Sherohman Tue, 24 Nov 2020 02:16:41 -0800

On Mon, Nov 23, 2020 at 11:28:37PM +0100, Stefan G. Weichinger wrote:
> Am 16.11.20 um 14:25 schrieb Dave Sherohman:
> I am a bit surprised by the fact you haven't yet received any reply on
> the list so far (maybe per direct/private reply).


I received one accidentally-off-list reply, as already mentioned.  But,
aside from that, I interpreted it as just the list acting up - if you
check the headers on the message you replied to, I sent it on Monday the
16th, but it didn't go out to the list until Friday the 20th.  So
getting on-list replies on the 24th is right in keeping with that
schedule...

> Your "project" and the related questions could start a new thread
> without problems ;-)

True.  But here's a new subject line, at least.  :)

> * how dynamic is your data: are the incremental changes big or small ...

We're currently doing backup via Tivoli Storage Manager.  The daily TSM
output shows a total of about 700GB per day in "Total number of bytes
transferred".  Most hosts are only sending some MB or maybe a dozen GB.
The substantial majority comes from two database servers (400GB and
150GB/day).

I only have access to the output emitted by the TSM client as it runs,
so I don't know what space is used on the server, but this 700GB/day
is the raw data size.  ("Objects compressed by: 0%")

> * what $dumpcycle is targetted?

Seven days is a nice default, but, given the scale of data here and the
request for maintaining 6 months of backups, I'm thinking 30 days might
be more sane.

Back when I was using amanda 20 years ago, I recall a lot of people
would run a 7-day tapecycle, then monthly and annual full archival
backups.  I assume something like that would be possible with vtapes as
well, so that could be an option for maintaining a seven-day dumpcycle
without needing an exabyte of storage.

And, personally, I think the 6 month retention is massive overkill in
any case.  I've been in this job for just over a decade, and I could
probably count the number of restores in that time on my fingers, and
none of them needed data more than a week old.

> * parallelity: will your new amanda server have multiple NICs etc / plan
> for a big holding disk (array)

We tend to default to 4 NICs on new server purchases and have gone
higher.  But we've only done active/passive bonding so far, which is
basically just single-NIC throughput.  We tried a higher-capacity mode
once, but the campus data center and I weren't able to get all the
pieces to coordinate properly to make it work.  (It was some years ago,
so I don't recall the details of the problems.)

Holding disk size is one of the things I'm looking for advice on.  The
largest DLE is currently a 19T NAS, but the admin responsible for that
system agrees that it should be split into multiple fliesystems, even
aside from backup-related reasons.  Assuming it doesn't get split, would
20T holding disk be sufficient or does it need to be 2x the largest DLE?

> * fast network is nice, but this results in a bottleneck called
> *storage* -> fast RAID arrays, maybe SSDs.

My boss isn't particularly price-sensitive, but I doubt that he could
swallow the cost of putting all the vtapes on SSD, so hopefully it won't
come to that.  SSD for the holding disk should be doable.

> I'd start with asking: how do your current backups look like?
> 
> What is the current rate of new/changed data generated?

Covered that above, but, to quickly reiterate, we're using Tivoli
Storage Manager, which runs daily incrementals totaling approx. 700GB
(uncompressed) per day, the bulk of which is 400GB from one database
server and 150GB from a second database server.  Both are running
mysql/mariadb, if that matters.

> * how long does it take to copy all the 40TB into my amanda box (*if* I
> did a FULL backup every time)?

The 400GB/day server takes about 8 hours to do its daily run.  If we
assume that data rate and *no* parallelization, it comes out to a bit
over a week for 40T.

However, I assume that's being throttled by the TSM server, because I
get approximately double that rate when copying disk images on my kvm
servers, and those are using remote glusterfs disk mounts, so the data
is crossing the network multiple times.

> * what grade of parallelity is possible?

As much as the network capacity will support, really.  Our current
backups kick off simultaneously for almost all servers (the one
exception is that 400G/day db server, which starts earlier).  About half
finish within a minute or so (only backing up a couple hundred MB or
less) and most are complete within half an hour.  It's pretty much just
db servers (the ones I've mentioned already, plus some postgresql
machines with between 10 and 50G/day) that take longer than an hour to
complete.

-- 
Dave Sherohman

Speccing a large amanda install (was: Re: How's amanda feeling these days?)

Reply via email to