I've been using bacula for many years, but as the volume of our data
has grown and we've gotten a new tape library, I'm about to implement
a new strategy for our backup jobs and I'd like your feedback.

        Environment:    scientific research center
        Data volume:    ~400TB
        Growth rate:    ~20TB/month     (new data)
        Churn rate:     ~10TB/month     (total size of files that exist and 
change in content but not significantly in size)
        Backup device:  tape library, 3x LTO8 drives, 80x LTO tapes
        Backup window:  undefined
        Restore window: undefined
        Bacula version: 9.6.7

We're using the GPFS filesystem, and doing filesystem snapshots every
15 minutes, with a limited set retained for at least 2 months. The
snapshots allow for almost instant restores of recent data and comparision
between different versions of files, without system administrator
intervention.

Because of snapshots, I'm planning to eliminate all nightly incremental
& differential backups to tape. Tape backups would be only for
archival/disaster-recovery purposes and for compliance with grant and
data management requirements.

The new strategy would be to do a full backup every 2 months, kept for
5 months. One backup would be kept for at least 2 years, the others would
be rotated (media reused). For example:

        January 2021            keep until January 2023
        March 2021              keep until August 2021
        May 2021                keep until October 2021
        July 2021               keep until December 2021
        September 2021          re-use March 2021 media, keep until February 
2022
        November 2021           re-use May 2021 media, keep until April 2022
        January 2022            keep until January 2024
        

All tape backups would be done from a snapshot, so that no files within
the source of the backup change during the process. A "run before job"
script would dump coherent copies of databases, then create a filesystem
snapshot dedicated to the backup. That snapshot would be removed when
the backup is complete.

We've got about 700 top-level directories for user accounts and research
projects. We'll probably run an individual backup job for each group of
directories alphabetically (A*, B*, etc), so that the 400TB will be spread
(unevenly) across about 45 Bacula jobs.


Thoughts? Suggestions?

Thanks,

Mark

-- 
Mark Bergman                                           voice: 215-746-4061      
 
mark.berg...@pennmedicine.upenn.edu                      fax: 215-614-0266
http://www.med.upenn.edu/cbica/
IT Technical Director, Center for Biomedical Image Computing and Analytics
Department of Radiology                         University of Pennsylvania


_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to