Re: [bareos-users] Backup strategy for large dataset

Spadajspadaj Fri, 07 Aug 2020 00:57:13 -0700

As I wrote earlier, this looks more like archiving plan, not a backupone (or a combination of backup and archiving). But more to the point -in case of backups you have to have a verification plan and periodicalrestore tests. In case of archiving you need to have a verification plan(i.e. every half a year you read each archive unit, check whether it'sreadable and if its checksum is correct; in case of more critical datayou might want to even have some kind of functional tests like trying toread the data into appropriate software and check whether the data isstill parseable and loadable; If any copy fails you'd want to re-createit from other existing copies).


On 07.08.2020 09:49, [email protected] wrote:

Thanks a lot, Brock, for your comprehensive post and also to theothers. I haven't fully worked through your example cases yet, but it will certainly help me to get my head around it all. Maybe it helps ifI provide a few more details about how the data/images are organized:

I run a Linux based virtualization cluster on RAID6-hosts with WindowsVMs.The images are organised in windows-folders of 2TB size each, like"E:\img\img01\" to currently "E:\img\img17\".Once a folder is full, it's contents will never change again. They'relike archives that will be read from but no more written to.


So I thought I'd proceed like this:
1. Backup "img01" to "img17" to tape, store the tapes offsite.

2. Do this a second time and store the tapes offsite, seperate fromthe first generation.

3. Do this a third time to disk, for quick access if needed.
4. Make sure the catalog of at least 1. and 2. is in a very safe place.
5. Develop a daily backup strategy - starting with "img18".

As for (1.) - (3.) I have created seperate Full-jobs for eachimgXX-folder. (1.) has already completed successfully, (2.) iscurrently in progress.I thought that once (1.) and (2.) are completed successfully I'm safewhat "img01-17" is concerned and never have to consider these foldersfor backup again. Right or am I missing something?

What I'd like to discuss here is (5.) - under consideration of a fewparameters:- the daily increment of image data is roughly 50 GB. BTW: The images(DICOM, JPEG2000) don't compress at all :).- for legal reasons we have to store the images on WORM-media. So Ineed a daily job that writes to tape.- the doctors want best possible protection against fire, supernova,Borg-attack etc. They want a daily tape change routine with the latestWORM-tape taken offsite.

For the daily tape change I could buy another LTO drive. I can alsoexpand my backup-server to fit above (3.) and the daily increment.


So, here's what I thought I need to develop:
- Backup the daily increment to disk.

- Backup the same daily increment to a WORM tape (in a new 1-slotdrive) that is part of a "daily change" pool of tapes (MON-SAT or so...)- Append the same daily increment to another WORM tape in the 8-slotloader. Once the tape is full, take it offsite and add a fresh tape inthe loader.

If that strategy doesn't sound to weird I need to transfer this into aworking bareos config.Sorry if it all sounds confusing but for me it's still really, reallycomplex.


Thanks
Andreas

[email protected] schrieb am Mittwoch, 5. August 2020 um20:21:10 UTC+2:


    You will have some complexity with the size of your data and the
    size of your loader. Unless your data compresses really well.
    Does it have more than one tape drive? Your total loader capacity
    is 48 TBytes raw, and you need 2x your full size to do
    Consolidations or new Fulls or you have gaps in your protection.

    If I’m reading this right you want an off site copy.

    If that’s correct I would go about this two different ways,

    * Get a much bigger loader with 2 drives
    or
    * Expand backups server raid6 to have Full + growth*Growth Window
    capacity

    I would then use migrate+Archive jobs to make my off site and copy
    to tape.

    In the first case you can avoid the extra migrate, just do an
    archive to a pool of tapes you eject.

    Workflow case 1 Bigger Tape loader 2 or more tape drives.
    * Spool to Disk
    * AI-Consolidated Pool Tape
    * AI-Incremental Pool Disk
    * Offsite Pool Tape

    Fulls and Consolidations backups go to AI-Consolidated Tape pool,
    Your daily go to disk until they are consolidated into tape.

    To create your off sites you can use a copy job of whatever full
    you want.
    
https://docs.bareos.org/TasksAndConcepts/AlwaysIncrementalBackupScheme.html#copy-jobs


    I personally for offsite to avoid issues with Always Incremental
    jobs use an Archive job
    
https://docs.bareos.org/TasksAndConcepts/AlwaysIncrementalBackupScheme.html#virtual-full-jobs


    This avoids the offsite tapes being upgraded to the primary copy
    when the Consolidate job prunes the older jobs.

    To do a VirtualFull archive job like this though you need enough
    tape for 2x the data otherwise your swapping tapes every 5.5 hours.
    I would then use a script called at the end of the archive job to
    eject the volumes and make volumestatus=Used from that Offsite
    pool. Load new tapes label and put back in Offsite for the next
    archive.

    To do AI jobs this way and the VirtualFull archive job you need 2
    tape drives to read from one pool/tape and write to the other. It
    will be fast as you get LTO speeds. No second Full backup.



    Workflow case 2 Bigger Disk in Backup Server
    * Spool if you want to avoid failed jobs cluttering disk volumes,
    maybe skip.
    * AI-Consolidated Pool Disk
    * AI-Incremental Pool Disk
    * Offsite Pool Tape

    This would work similar, but your Fulls and Consolidations all
    happen on Disk. This now means your disk needs to be 2x Full +
    Incremental. As your Full or VirtualFull consolidation will read
    all the data and make a second copy before purging the old copy
    from the catalog. So you won’t need the disk space long, but you
    will need it.

    As for Offsite I would do the archive job again, Make a
    VirtualFull mark it as archive to the Offsite pool in the Tape
    loader. Eject all 7-8 tapes and load new ones.

    Assuming you don’t want to be doing a new full Offsite every time,
    you can do copy jobs from your pools but I have found Bareos to
    behave in an unintuitive way when it comes Consolidations and
    offline media Copies. You end up having to run the consolidate job
    twice (once primary copy, Copy job then becomes primary and is
    picked up in next consolidation, where it wants to read from that
    media now) When you really just want the Copy jobs to be purged in
    most cases along with the primary copy on a Consolidate.

    So I hoped you wanted just a full offsite say 1/month.

    Another option if you wanted is to setup a second SD with disk off
    site, and send just your incremental there as part of a Copy job.
    Sneakernet the tapes with Full less often.


    In general Always Incremental with copies for off site needs some
    work unless you are ok having a full snapshot every so often with
    an Archive job. I really hope Bareos improves on this in the future.


    Lastly if you do use the archive job. It’s it’s own stand alone
    job, As your using WORM and cannot reuse your tapes, you probably
    want to crank up their retention so you can easly restore files
    form any batch of tapes. I would also for all of these run an
    extra backup of your catalog and append it to the offsite copies
    so you can pull catalog data in the event everything is lost. If
    you are encrypting (you are right?) be sure to keep a second copy
    of all your keys, I keep mine in OnePassword. I also upload a dump
    of the catalog to a cloud storage provider (Secure as you will).
    Doing a bscan of 35 TBytes will take a while, so please please
    please keep extra copies of your catlog and keys.

    Also do a lot of testing with the archive jobs. How to turn them
    back into backup jobs to make a new VirtualFull for the real job,
    etc. If your Full’s ever screw up, you will want to do this (what
    I do for road warriors where Fulls are not really possible) to
    avoid the long time to do a new full from the source. Putting an
    Archive job back into a backup job, running a manual virtual full
    of a job will suck the files defined in the archive job in (won’t
    if it’s an archive, thus the change to update job type=B) and
    rebuild a full, then do an incremental.


    I personally do a version of case 2, but I use Migrate jobs to
    move the AI-Consolidated to a tape pool, but my jobs full’s are
    much smaller than yours. I also only do an offsite archive job
    once a month. Otherwise Incremental and AI-Consolidated on Disk,
    Migrate to tape pool purge AI-Consolidated jobs quickly. So when I
    consolidate it’s Tape + Disk —> Disk —> Migrate Tape —> Archive Tape

    # copy job to long term tape
    Job {
    Name = "Migrate-To-Offsite-AI-Consolidated"
    Client = myth-fd
    Type = Migrate
    Purge Migration Job = yes
    Pool = AI-Consolidated
    Level = Full
    Next Pool = Offsite
    Schedule = WeeklyCycleAfterBackup
    Allow Duplicate Jobs = no
    Priority = 4 #before catalog dump
    Messages = Standard
    Selection Type = PoolTime # 7 days
    Spool Data = No
    Selection Pattern = "."
    RunAfterJob = "/usr/local/bin/prune.sh”
    }

    The script forces the volumes to truncate/prune after I move the
    jobs off.


    Brock Palen
    1 (989) 277-6075 <tel:(989)%20277-6075>
    [email protected]
    www.mlds-networks.com <http://www.mlds-networks.com>
    Websites, Linux, Hosting, Joomla, Consulting



    > On Aug 5, 2020, at 5:40 AM, [email protected] <[email protected]>
    wrote:
    >
    >
    > Hello everybody,
    >
    > I'm in the process of developing a regular backup-strategy and
    found that I need some assistance. Here are the parameters in short:
    > - 35TB of medical imaging data
    > - daily increment of 50-60GB
    > - one site, 10Gb/s Backbone
    > - Overland NEOs LTO7 Storageloader, 8-bay.
    > attached to
    > - dedicated backupserver with 20TB RAID6, will be enhanced as
    needed
    >
    > I have already backed up all data on LTO7 tape (WORM, for legal
    reasons) as of Dec'19.
    > A 2nd generation, also LTO7 WORM, is currently in progress
    (VERY! slow, ~12MB/s, different story). Tapes are/will be stored
    offsite.
    > After that I'm planning to do a 3rd generation on disk inhouse
    and amend the 1st and 2nd gen on tape so that I end up with three
    generations identical to a certain cutoff date.
    >
    > Then, what to do next? How could a daily routine look like?
    > The radiologists are very concerned about their data and would
    like to see a daily tape change with the ejected tape being taken
    offsite in the evening. A 1-bay LTO8 drive could be purchased
    then, the daily tape change would be done by an apprentice or so...
    >
    > So I thought about an Always-Incremental-B2D2T-strategy starting
    with the above cutoff day. But I still have too less experience
    and so I'm struggeling hard to develop a clear structure in my
    head - WHAT is WHEN being copied WHERE - let alone transform that
    into a working bareos configuration.
    >
    > Do my thoughts appear reasonable up to that point?
    > BTW: can a daily tape change be realized at all, where you just
    push the button to eject the TUE-tape, insert the WED-tape, and so
    on without having to stop the storage-daemon in order to unlock
    the tape?
    >
    > Thanks for helping me with the first steps to bring this under way.
    >
    > Andreas
    >
    >
    >
    > --
    > You received this message because you are subscribed to the
    Google Groups "bareos-users" group.
    > To unsubscribe from this group and stop receiving emails from
    it, send an email to [email protected].
    > To view this discussion on the web visit
    
https://groups.google.com/d/msgid/bareos-users/64fe4cfa-6c87-4d28-aed9-9e30de285ee1n%40googlegroups.com.


--

You received this message because you are subscribed to the GoogleGroups "bareos-users" group.To unsubscribe from this group and stop receiving emails from it, sendan email to [email protected]<mailto:[email protected]>.To view this discussion on the web visithttps://groups.google.com/d/msgid/bareos-users/4aa25985-fd74-4e2b-bb71-95ed37154bc1n%40googlegroups.com<https://groups.google.com/d/msgid/bareos-users/4aa25985-fd74-4e2b-bb71-95ed37154bc1n%40googlegroups.com?utm_medium=email&utm_source=footer>.


--
You received this message because you are subscribed to the Google Groups 
"bareos-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/bareos-users/acb11315-26c2-3e65-2c15-c11c9edaaaaa%40gmail.com.

Re: [bareos-users] Backup strategy for large dataset

Reply via email to