Thank you for the very thorough response. I've actually considered
offloading the backup duties in a similar kind of way (nearby rsync host
that actually runs the FD). You've given me a lot to think about -- I'll
post back to the list here when I have more feedback.
Our director is using PostgreSQL for the backend / catalog. It's solid
as far as stability is concerned, but there are times I/O really slows
it down. I'm using LVM snapshots for consistent data capture, which has
been reliable. I've been toying more with bpipe to see if there's a
creative way I can minimize tcp overhead during transfer to our SD, but
it doesn't end up completing correctly -- the tar command I was playing
with complains about eof, and returns a non-zero status. Even if it had
worked, I need to have granularity with my restores -- it can't be a
system wide all-or-nothing approach.
Jason
On 07/19/2017 04:04 PM, Damiano Verzulli wrote:
2017-07-18 18:21 GMT+02:00 Jason Bailey <[email protected]
<mailto:[email protected]>>:
I've got a mail server full of millions of tiny little files
(messages) that I back up to my Bareos server at another location.
It works, but it takes several days to complete,
[...]
What do you suggest I try to speed things up?
We were in a *very similar* environment:
* one maildir backend (production mail-server) hosting ~2K mailboxes;
* slightly more than 5TB of data, with more the 20M tiny files;
* full backup to an SD connected to a tape-library, taking ~5 days!
What we did:
* creating an "online-backup" on a different server, kept updated
via RSYNC (RSYNC is EXTREMELY fast SYNCing: a couple of hours for
the whole mailboxes!), and move the FD from the production server,
to this "on-line backup";
* heavily using LVM on "on-line backup": we keep 7 daily snapshot
(so we have 7 backups "on-disk" ready for quick restore);
* as of Bareos:
o cutted the whole set of mailboxes in 4 subset (a-d; e-h; i-m;
n-t; u-z). So instead of a single huge FULL. we have four BIG
FULL;
o carefully planned the backup of the whole jobs (FULL and
Incremental of the all 4 subset) so to minimize problems
related to failed jobs;
o employed some particular efforts in telling bareos to
"strictly follow" our plans (and, not, for example, turn an
Incremental to a Full, as a previous job failed!)
Results:
* we were succesfully in having a consistent backup during last
three years. But....
* due to the increase of the mailboxes size, things are getting
critical again. So...
* during last three months we're planning our BareOS 3.0
backup-platform (described below).
Problems, as seen from us, are:
* Catalog: we choosed MySQL (initially, without caring much about
issues. So we have, now MySQL). As you guess, the "Inserting
DIR..." phase of the backup job is _IMPORTANT_: inserting millions
of record in the catalog... imply some problems. We needed to
review this point;
* Storage performance (FD access to underline storage): even tough
we were using a SAN, it looked that reading peformance on disks
were sub-optimal. We needed to increase the read-speed of the FD;
* Storage performance (Wriing speed of our LTO5 tape): JOB
statistics were showing really low number, as of "Rate:" (lower
that 10/15Mbps x seconds). We needed to increase this.
What we're experimenti, right now, to solve above issues:
* We've built a new bareos infrastructure, with:
o an ad-hoc DIR (+MySQL)
o a new "on-line backup" box to read from (an HP Gen8
Microserver, with 4x8TB SATA disks, in RAID10)
o a new "disk-based SD" to write to (another HP Gen8, with other
4x8TB SATA disks, stand-alone)
this new platform is significantly faster than the previous, and is
touching the 1Gbps limits of the LAN (we're still fine-tuning the
schedule od the jobs, as well as the "syncronization" of the backups
wrt. the "tape-based" platform). Here's the "current" network
utilization of the FD (the RAID10 box):
https://monitor.unich.it/cgi-bin/munin-cgi-graph/drbd-store-03-ch/drbd-store-03-ch/if_eno1-week.png
We also guessed that once the backup was inside the disk-based SD, it
would be faster to move it (MOVE JOB) towards the tape but.... *we
were wrong*! We're still investigating.... but it looks that a "move
job" goes through a looooong cycle of DELETE (a single record from the
old job) and INSERT (such a record for the new job), so it's mostly
the same (in terms of performance) as of a "common" backup (please,
again: we're still investigating. So we may be wrong! Apologize for
mistake!)
A final point: as tape-backup are our second-line of backup (remember
the 7 snapshot), we're configured the to-the-tape-backup reading
from-the-fast-fd so to avoid the insertion of records in the "File
table". Results: this:
JobId: 12937
Job: archive-POSTA.2017-07-12_09.01.00_59
Backup Level: Full
[...]
Scheduled time: 12-lug-2017 09:00:43
Start time: 12-lug-2017 09:01:02
End time: 13-lug-2017 17:20:12
Elapsed time: 1 day 8 hours 19 mins 10 secs
Priority: 10
FD Files Written: 21,022,371
SD Files Written: 21,022,371
FD Bytes Written: 5,146,074,894,823 (5.146 TB)
SD Bytes Written: 5,150,469,858,108 (5.150 TB)
Rate: 44229.3 KB/s
Software Compression: None
[...]
Volume name(s): 10D17VL5|08D17VL5|09D17VL5
[...]
Termination: Backup OK
So we moved from more than 5 days... to slightly more than 1 day :-)
But we're still "thinking" how to improve :-)
Bye,
DV
P.S.: this is a really complex topic, requiring LOTS of time to
describe details. Feel free to ask, but please, be kind and patient...
waiting for the reply :-)
--
You received this message because you are subscribed to the Google Groups
"bareos-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
For more options, visit https://groups.google.com/d/optout.