Re: [bareos-devel] RFC: Optimization for virtual full jobs on disk storage

Andreas Rogge Thu, 03 Nov 2022 13:40:45 -0700

Am 02.11.22 um 16:00 schrieb Burkhard Linke:

Hi,
we are running bareos with disk based storage (maybe adding tapes laterfor long term archives). It currently hosts ~220 million files withroughly 900TB data in about 150 jobs. Job size range from a fewfiles/bytes to several hundred TB. We are using a standard textbookalways incremental scheme, with extra long retention file for theinitial full backup.
Example:

Job {

...

   Always Incremental = true
   Always Incremental Job Retention = 6 months
   Always Incremental Keep Number = 60
   Always Incremental Max Full Age = 5 years

...

}
This works fine, but the virtual full jobs triggered by theconsolidation job need to be optimized.


When I reverse engineer your settings, it means:
- I want to keep every Incrementals that was made in the past 6 months
- I want to keep at least 60 of these Incrementals
- I want to keep a full backup around that isn't older than 5 years

Assuming that you're doing daily backups, you'll end up with:
- 6 * 30 = 180 Incrementals for the last 180 days
- 1 Full that is on average 2.75 years old

As long as your Full isn't older than 5 years, consolidation will takethe oldest Incremental, merge it with the second oldest Incremental intoa new Incremental (which is now considered the oldest one).

When your Full is 5 years old, consolidation will then take the Full,the oldest Incremental and the second oldest Incremental and merge theminto a new Full.

So in your setup, the oldest Incremental will grow for 4.5 years untilit gets merged into your Full. During that period it will get bigger andbigger, which will make the consolidation take longer and longer.

Long story short: you can probably save a lot of time moving data aroundif you decrease AI Max Full Age to maybe 9 months or so, effectivelyproducing a new Full every 3 months and keeping the daily consolidationa lot smaller.

In the current implementation (correct me if I'm wrong), the virtualfull job reads all data from the jobs to be consolidated, and storesthem in the full pool. With the configuration above, this is fine forthe first run of a virtual full job. It will processes two incrementalruns, and stores their content (minus overwritten files / deleted files)in the 'AI-Consolidated' pool. On the next run, it will read the datafrom the previous virtual full run (in pool 'AI-Consolidated') + datafrom the next incremental run (in pool 'AI-Incremental') and store it inthe 'AI-Consolidated pool'. So data already stored in the correct poolis read and written again. This is fine for tape based backups, but incase of disk based backups data is copied unnecessary. For large jobs(think 100-200 TB) it might even become unfeasible since virtual fullrun will take days/weeks.

I agree that AI (and Virtual Full in general) is pretty i/o heavy forthe SD. However, that's simply how it works right now.If you need to consolidate large jobs (as you said 100-200 TB), you'llneed unreasonably fast storage (i.e. a lot more than 1 GB/s) to finishwithin a day.The only workaround is to cut these jobs into pieces and configure MaxFull Consolidations to spread the consolidation into a new full backupacross several days.

I don't know the details of the volume header format, but it should bepossible to implement the following method:
for each file:
1. if source and target pool are different, use standard copy method
2. if source and target pool are not disk based, use standard copy method
3. update header in existing volume(s) to reflect changes (e.g.different job id)
4. update database to reflect changes
5. in case of pruned files, truncate corresponding chunk in the volume(s)
It might be tricky to ensure atomicity of steps 3 + 4 to avoidinconsistencies. Most filesystems should be able to handle sparse filescorrectly, an extra "defragmentation" steps seems to be unnecessary.

Any comments on this? Are there any obvious showstoppers?

To make that work we would need to

1. Add a new job to the catalog that references the ranges of the jobsto be consolidated2. Change all the block headers in existing volumes so that they belongto the consolidated job3. Change all the record headers so they have file ids that are strictlyincreasing (from the new job's point of view)

4. Mark records that are no longer needed in some way

5. Rewrite the first SOS record, remove all other SOS and all but thelast EOS record and overwrite the last EOS record.6. Remove all blocks that consist only of records that are no longerneeded (and make sure the SD and all tools can read volumes with nulledblocks in them)

7. Remove the original jobs from the catalog

8. Provide a 100% failsafe way to resume operations 2 to 6, otherwise acrash during that operation would leave all data in the job unreadable.

Sounds like quite an agenda. With the current on-disk format, I wouldn'tdare trying it. There's just too much that can go wrong in the process.

We're planning to introduce another file-based storage backend with adifferent on-disk format next year. That would theoretically allow to dovirtual full backups with zero-copy for the payload (i.e. it would stillread and write the block and record headers, but wouldn't copy thepayload anymore).However, that backend is still vaporware today and zero-copy has noteven made it to the agenda yet.


Best Regards,
Andreas

--
Andreas Rogge                             andreas.ro...@bareos.com
  Bareos GmbH & Co. KG                      Phone: +49 221-630693-86
  http://www.bareos.com

  Sitz der Gesellschaft: Köln | Amtsgericht Köln: HRA 29646
  Komplementär: Bareos Verwaltungs-GmbH
  Geschäftsführer: S. Dühr, M. Außendorf, J. Steffens, Philipp Storz

--
You received this message because you are subscribed to the Google Groups 
"bareos-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to bareos-devel+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/bareos-devel/b17e7a36-4ea6-ddfe-39ca-c9fa76b1ea74%40bareos.com.

Re: [bareos-devel] RFC: Optimization for virtual full jobs on disk storage

Reply via email to