Thank you for the very thorough response. I've actually considered offloading the backup duties in a similar kind of way (nearby rsync host that actually runs the FD). You've given me a lot to think about -- I'll post back to the list here when I have more feedback.

Our director is using PostgreSQL for the backend / catalog. It's solid as far as stability is concerned, but there are times I/O really slows it down. I'm using LVM snapshots for consistent data capture, which has been reliable. I've been toying more with bpipe to see if there's a creative way I can minimize tcp overhead during transfer to our SD, but it doesn't end up completing correctly -- the tar command I was playing with complains about eof, and returns a non-zero status. Even if it had worked, I need to have granularity with my restores -- it can't be a system wide all-or-nothing approach.

Jason



On 07/19/2017 04:04 PM, Damiano Verzulli wrote:


2017-07-18 18:21 GMT+02:00 Jason Bailey <[email protected] <mailto:[email protected]>>:

    I've got a mail server full of millions of tiny little files
    (messages) that I back up to my Bareos server at another location.
    It works, but it takes several days to complete,
    [...]
    What do you suggest I try to speed things up?

We were in a *very similar* environment:

  * one maildir backend (production mail-server) hosting ~2K mailboxes;
  * slightly more than 5TB of data, with more the 20M tiny files;
  * full backup to an SD connected to a tape-library, taking ~5 days!

What we did:

  * creating an "online-backup" on a different server, kept updated
    via RSYNC (RSYNC is EXTREMELY fast SYNCing: a couple of hours for
    the whole mailboxes!), and move the FD from the production server,
    to this "on-line backup";
  * heavily using LVM on "on-line backup": we keep 7 daily snapshot
    (so we have 7 backups "on-disk" ready for quick restore);
  * as of Bareos:
      o cutted the whole set of mailboxes in 4 subset (a-d; e-h; i-m;
        n-t; u-z). So instead of a single huge FULL. we have four BIG
        FULL;
      o carefully planned the backup of the whole jobs (FULL and
        Incremental of the all 4 subset) so to minimize problems
        related to failed jobs;
      o employed some particular efforts in telling bareos to
        "strictly follow" our plans (and, not, for example, turn an
        Incremental to a Full, as a previous job failed!)

Results:

  * we were succesfully in having a consistent backup during last
    three years. But....
  * due to the increase of the mailboxes size, things are getting
    critical again. So...
  * during last three months we're planning our BareOS 3.0
    backup-platform (described below).

Problems, as seen from us, are:

  * Catalog: we choosed MySQL (initially, without caring much about
    issues. So we have, now MySQL). As you guess, the "Inserting
    DIR..." phase of the backup job is _IMPORTANT_: inserting millions
    of record in the catalog... imply some problems. We needed to
    review this point;
  * Storage performance (FD access to underline storage): even tough
    we were using a SAN, it looked that reading peformance on disks
    were sub-optimal. We needed to increase the read-speed of the FD;
  * Storage performance (Wriing speed of our LTO5 tape): JOB
    statistics were showing really low number, as of "Rate:" (lower
    that 10/15Mbps x seconds). We needed to increase this.


What we're experimenti, right now, to solve above issues:

  * We've built a new bareos infrastructure, with:
      o an ad-hoc DIR (+MySQL)
      o a new "on-line backup" box to read from (an HP Gen8
        Microserver, with 4x8TB SATA disks, in RAID10)
      o a new "disk-based SD" to write to (another HP Gen8, with other
        4x8TB SATA disks, stand-alone)

this new platform is significantly faster than the previous, and is touching the 1Gbps limits of the LAN (we're still fine-tuning the schedule od the jobs, as well as the "syncronization" of the backups wrt. the "tape-based" platform). Here's the "current" network utilization of the FD (the RAID10 box):

https://monitor.unich.it/cgi-bin/munin-cgi-graph/drbd-store-03-ch/drbd-store-03-ch/if_eno1-week.png


We also guessed that once the backup was inside the disk-based SD, it would be faster to move it (MOVE JOB) towards the tape but.... *we were wrong*! We're still investigating.... but it looks that a "move job" goes through a looooong cycle of DELETE (a single record from the old job) and INSERT (such a record for the new job), so it's mostly the same (in terms of performance) as of a "common" backup (please, again: we're still investigating. So we may be wrong! Apologize for mistake!)

A final point: as tape-backup are our second-line of backup (remember the 7 snapshot), we're configured the to-the-tape-backup reading from-the-fast-fd so to avoid the insertion of records in the "File table". Results: this:

JobId:                  12937
  Job: archive-POSTA.2017-07-12_09.01.00_59
  Backup Level:           Full
[...]
  Scheduled time:         12-lug-2017 09:00:43
  Start time:             12-lug-2017 09:01:02
  End time:               13-lug-2017 17:20:12
  Elapsed time:           1 day 8 hours 19 mins 10 secs
  Priority:               10
  FD Files Written:       21,022,371
  SD Files Written:       21,022,371
  FD Bytes Written:       5,146,074,894,823 (5.146 TB)
  SD Bytes Written:       5,150,469,858,108 (5.150 TB)
  Rate:                   44229.3 KB/s
  Software Compression:   None
[...]
  Volume name(s):         10D17VL5|08D17VL5|09D17VL5
[...]
  Termination:            Backup OK

So we moved from more than 5 days... to slightly more than 1 day :-)

But we're still "thinking" how to improve :-)

Bye,

DV


P.S.: this is a really complex topic, requiring LOTS of time to describe details. Feel free to ask, but please, be kind and patient... waiting for the reply :-)



--
You received this message because you are subscribed to the Google Groups 
"bareos-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to