2017-07-18 18:21 GMT+02:00 Jason Bailey <[email protected]>:

> I've got a mail server full of millions of tiny little files (messages)
> that I back up to my Bareos server at another location. It works, but it
> takes several days to complete,
> [...]
> What do you suggest I try to speed things up?
>

We were in a *very similar* environment:

   - one maildir backend (production mail-server) hosting ~2K mailboxes;
   - slightly more than 5TB of data, with more the 20M tiny files;
   - full backup to an SD connected to a tape-library, taking ~5 days!

What we did:

   - creating an "online-backup" on a different server, kept updated via
   RSYNC (RSYNC is EXTREMELY fast SYNCing: a couple of hours for the whole
   mailboxes!), and move the FD from the production server, to this "on-line
   backup";
   - heavily using LVM on "on-line backup": we keep 7 daily snapshot (so we
   have 7 backups "on-disk" ready for quick restore);
   - as of Bareos:
      - cutted the whole set of mailboxes in 4 subset (a-d; e-h; i-m; n-t;
      u-z). So instead of a single huge FULL. we have four BIG FULL;
      - carefully planned the backup of the whole jobs (FULL and
      Incremental of the all 4 subset) so to minimize problems related
to failed
      jobs;
      - employed some particular efforts in telling bareos to "strictly
      follow" our plans (and, not, for example, turn an Incremental to
a Full, as
      a previous job failed!)

Results:

   - we were succesfully in having a consistent backup during last three
   years. But....
   - due to the increase of the mailboxes size, things are getting critical
   again. So...
   - during last three months we're planning our BareOS 3.0 backup-platform
   (described below).


Problems, as seen from us, are:

   - Catalog: we choosed MySQL (initially, without caring much about
   issues. So we have, now MySQL). As you guess, the "Inserting DIR..." phase
   of the backup job is _IMPORTANT_: inserting millions of record in the
   catalog... imply some problems. We needed to review this point;
   - Storage performance (FD access to underline storage): even tough we
   were using a SAN, it looked that reading peformance on disks were
   sub-optimal. We needed to increase the read-speed of the FD;
   - Storage performance (Wriing speed of our LTO5 tape): JOB statistics
   were showing really low number, as of "Rate:" (lower that 10/15Mbps x
   seconds). We needed to increase this.


What we're experimenti, right now, to solve above issues:

   - We've built a new bareos infrastructure, with:
      - an ad-hoc DIR (+MySQL)
      - a new "on-line backup" box to read from (an HP Gen8 Microserver,
      with 4x8TB SATA disks, in RAID10)
      - a new "disk-based SD" to write to (another HP Gen8, with other
      4x8TB SATA disks, stand-alone)

this new platform is significantly faster than the previous, and is
touching the 1Gbps limits of the LAN (we're still fine-tuning the schedule
od the jobs, as well as the "syncronization" of the backups wrt. the
"tape-based" platform). Here's the "current" network utilization of the FD
(the RAID10 box):

[image:
https://monitor.unich.it/cgi-bin/munin-cgi-graph/drbd-store-03-ch/drbd-store-03-ch/if_eno1-week.png]


We also guessed that once the backup was inside the disk-based SD, it would
be faster to move it (MOVE JOB) towards the tape but.... *we were wrong*!
We're still investigating.... but it looks that a "move job" goes through a
looooong cycle of DELETE (a single record from the old job) and INSERT
(such a record for the new job), so it's mostly the same (in terms of
performance) as of a "common" backup (please, again: we're still
investigating. So we may be wrong! Apologize for mistake!)

A final point: as tape-backup are our second-line of backup (remember the 7
snapshot), we're configured the to-the-tape-backup reading from-the-fast-fd
so to avoid the insertion of records in the "File table". Results: this:

  JobId:                  12937
  Job:                    archive-POSTA.2017-07-12_09.01.00_59
  Backup Level:           Full
[...]
  Scheduled time:         12-lug-2017 09:00:43
  Start time:             12-lug-2017 09:01:02
  End time:               13-lug-2017 17:20:12
  Elapsed time:           1 day 8 hours 19 mins 10 secs
  Priority:               10
  FD Files Written:       21,022,371
  SD Files Written:       21,022,371
  FD Bytes Written:       5,146,074,894,823 (5.146 TB)
  SD Bytes Written:       5,150,469,858,108 (5.150 TB)
  Rate:                   44229.3 KB/s
  Software Compression:   None
[...]
  Volume name(s):         10D17VL5|08D17VL5|09D17VL5
[...]
  Termination:            Backup OK

So we moved from more than 5 days... to slightly more than 1 day :-)

But we're still "thinking" how to improve :-)

Bye,

DV


P.S.: this is a really complex topic, requiring LOTS of time to describe
details. Feel free to ask, but please, be kind and patient... waiting for
the reply :-)

-- 
You received this message because you are subscribed to the Google Groups 
"bareos-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to