2017-07-18 18:21 GMT+02:00 Jason Bailey <[email protected]>:
> I've got a mail server full of millions of tiny little files (messages)
> that I back up to my Bareos server at another location. It works, but it
> takes several days to complete,
> [...]
> What do you suggest I try to speed things up?
>
We were in a *very similar* environment:
- one maildir backend (production mail-server) hosting ~2K mailboxes;
- slightly more than 5TB of data, with more the 20M tiny files;
- full backup to an SD connected to a tape-library, taking ~5 days!
What we did:
- creating an "online-backup" on a different server, kept updated via
RSYNC (RSYNC is EXTREMELY fast SYNCing: a couple of hours for the whole
mailboxes!), and move the FD from the production server, to this "on-line
backup";
- heavily using LVM on "on-line backup": we keep 7 daily snapshot (so we
have 7 backups "on-disk" ready for quick restore);
- as of Bareos:
- cutted the whole set of mailboxes in 4 subset (a-d; e-h; i-m; n-t;
u-z). So instead of a single huge FULL. we have four BIG FULL;
- carefully planned the backup of the whole jobs (FULL and
Incremental of the all 4 subset) so to minimize problems related
to failed
jobs;
- employed some particular efforts in telling bareos to "strictly
follow" our plans (and, not, for example, turn an Incremental to
a Full, as
a previous job failed!)
Results:
- we were succesfully in having a consistent backup during last three
years. But....
- due to the increase of the mailboxes size, things are getting critical
again. So...
- during last three months we're planning our BareOS 3.0 backup-platform
(described below).
Problems, as seen from us, are:
- Catalog: we choosed MySQL (initially, without caring much about
issues. So we have, now MySQL). As you guess, the "Inserting DIR..." phase
of the backup job is _IMPORTANT_: inserting millions of record in the
catalog... imply some problems. We needed to review this point;
- Storage performance (FD access to underline storage): even tough we
were using a SAN, it looked that reading peformance on disks were
sub-optimal. We needed to increase the read-speed of the FD;
- Storage performance (Wriing speed of our LTO5 tape): JOB statistics
were showing really low number, as of "Rate:" (lower that 10/15Mbps x
seconds). We needed to increase this.
What we're experimenti, right now, to solve above issues:
- We've built a new bareos infrastructure, with:
- an ad-hoc DIR (+MySQL)
- a new "on-line backup" box to read from (an HP Gen8 Microserver,
with 4x8TB SATA disks, in RAID10)
- a new "disk-based SD" to write to (another HP Gen8, with other
4x8TB SATA disks, stand-alone)
this new platform is significantly faster than the previous, and is
touching the 1Gbps limits of the LAN (we're still fine-tuning the schedule
od the jobs, as well as the "syncronization" of the backups wrt. the
"tape-based" platform). Here's the "current" network utilization of the FD
(the RAID10 box):
[image:
https://monitor.unich.it/cgi-bin/munin-cgi-graph/drbd-store-03-ch/drbd-store-03-ch/if_eno1-week.png]
We also guessed that once the backup was inside the disk-based SD, it would
be faster to move it (MOVE JOB) towards the tape but.... *we were wrong*!
We're still investigating.... but it looks that a "move job" goes through a
looooong cycle of DELETE (a single record from the old job) and INSERT
(such a record for the new job), so it's mostly the same (in terms of
performance) as of a "common" backup (please, again: we're still
investigating. So we may be wrong! Apologize for mistake!)
A final point: as tape-backup are our second-line of backup (remember the 7
snapshot), we're configured the to-the-tape-backup reading from-the-fast-fd
so to avoid the insertion of records in the "File table". Results: this:
JobId: 12937
Job: archive-POSTA.2017-07-12_09.01.00_59
Backup Level: Full
[...]
Scheduled time: 12-lug-2017 09:00:43
Start time: 12-lug-2017 09:01:02
End time: 13-lug-2017 17:20:12
Elapsed time: 1 day 8 hours 19 mins 10 secs
Priority: 10
FD Files Written: 21,022,371
SD Files Written: 21,022,371
FD Bytes Written: 5,146,074,894,823 (5.146 TB)
SD Bytes Written: 5,150,469,858,108 (5.150 TB)
Rate: 44229.3 KB/s
Software Compression: None
[...]
Volume name(s): 10D17VL5|08D17VL5|09D17VL5
[...]
Termination: Backup OK
So we moved from more than 5 days... to slightly more than 1 day :-)
But we're still "thinking" how to improve :-)
Bye,
DV
P.S.: this is a really complex topic, requiring LOTS of time to describe
details. Feel free to ask, but please, be kind and patient... waiting for
the reply :-)
--
You received this message because you are subscribed to the Google Groups
"bareos-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
For more options, visit https://groups.google.com/d/optout.