Am 02.11.22 um 16:00 schrieb Burkhard Linke:
Hi,

we are running bareos with disk based storage (maybe adding tapes later for long term archives). It currently hosts ~220 million files with roughly 900TB data in about 150 jobs. Job size range from a few files/bytes to several hundred TB. We are using a standard textbook always incremental scheme, with extra long retention file for the initial full backup.

Example:

Job {
...

   Always Incremental = true
   Always Incremental Job Retention = 6 months
   Always Incremental Keep Number = 60
   Always Incremental Max Full Age = 5 years
...
}

This works fine, but the virtual full jobs triggered by the consolidation job need to be optimized.

When I reverse engineer your settings, it means:
- I want to keep every Incrementals that was made in the past 6 months
- I want to keep at least 60 of these Incrementals
- I want to keep a full backup around that isn't older than 5 years

Assuming that you're doing daily backups, you'll end up with:
- 6 * 30 = 180 Incrementals for the last 180 days
- 1 Full that is on average 2.75 years old

As long as your Full isn't older than 5 years, consolidation will take the oldest Incremental, merge it with the second oldest Incremental into a new Incremental (which is now considered the oldest one).

When your Full is 5 years old, consolidation will then take the Full, the oldest Incremental and the second oldest Incremental and merge them into a new Full.

So in your setup, the oldest Incremental will grow for 4.5 years until it gets merged into your Full. During that period it will get bigger and bigger, which will make the consolidation take longer and longer.

Long story short: you can probably save a lot of time moving data around if you decrease AI Max Full Age to maybe 9 months or so, effectively producing a new Full every 3 months and keeping the daily consolidation a lot smaller.

In the current implementation (correct me if I'm wrong), the virtual full job reads all data from the jobs to be consolidated, and stores them in the full pool. With the configuration above, this is fine for the first run of a virtual full job. It will processes two incremental runs, and stores their content (minus overwritten files / deleted files) in the 'AI-Consolidated' pool. On the next run, it will read the data from the previous virtual full run (in pool 'AI-Consolidated') + data from the next incremental run (in pool 'AI-Incremental') and store it in the 'AI-Consolidated pool'. So data already stored in the correct pool is read and written again. This is fine for tape based backups, but in case of disk based backups data is copied unnecessary. For large jobs (think 100-200 TB) it might even become unfeasible since virtual full run will take days/weeks.
I agree that AI (and Virtual Full in general) is pretty i/o heavy for the SD. However, that's simply how it works right now. If you need to consolidate large jobs (as you said 100-200 TB), you'll need unreasonably fast storage (i.e. a lot more than 1 GB/s) to finish within a day. The only workaround is to cut these jobs into pieces and configure Max Full Consolidations to spread the consolidation into a new full backup across several days.

I don't know the details of the volume header format, but it should be possible to implement the following method:

for each file:
1. if source and target pool are different, use standard copy method
2. if source and target pool are not disk based, use standard copy method
3. update header in existing volume(s) to reflect changes (e.g. different job id)
4. update database to reflect changes
5. in case of pruned files, truncate corresponding chunk in the volume(s)

It might be tricky to ensure atomicity of steps 3 + 4 to avoid inconsistencies. Most filesystems should be able to handle sparse files correctly, an extra "defragmentation" steps seems to be unnecessary.

Any comments on this? Are there any obvious showstoppers?
To make that work we would need to
1. Add a new job to the catalog that references the ranges of the jobs to be consolidated 2. Change all the block headers in existing volumes so that they belong to the consolidated job 3. Change all the record headers so they have file ids that are strictly increasing (from the new job's point of view)
4. Mark records that are no longer needed in some way
5. Rewrite the first SOS record, remove all other SOS and all but the last EOS record and overwrite the last EOS record. 6. Remove all blocks that consist only of records that are no longer needed (and make sure the SD and all tools can read volumes with nulled blocks in them)
7. Remove the original jobs from the catalog
8. Provide a 100% failsafe way to resume operations 2 to 6, otherwise a crash during that operation would leave all data in the job unreadable.

Sounds like quite an agenda. With the current on-disk format, I wouldn't dare trying it. There's just too much that can go wrong in the process.

We're planning to introduce another file-based storage backend with a different on-disk format next year. That would theoretically allow to do virtual full backups with zero-copy for the payload (i.e. it would still read and write the block and record headers, but wouldn't copy the payload anymore). However, that backend is still vaporware today and zero-copy has not even made it to the agenda yet.

Best Regards,
Andreas

--
Andreas Rogge                             andreas.ro...@bareos.com
  Bareos GmbH & Co. KG                      Phone: +49 221-630693-86
  http://www.bareos.com

  Sitz der Gesellschaft: Köln | Amtsgericht Köln: HRA 29646
  Komplementär: Bareos Verwaltungs-GmbH
  Geschäftsführer: S. Dühr, M. Außendorf, J. Steffens, Philipp Storz

--
You received this message because you are subscribed to the Google Groups 
"bareos-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to bareos-devel+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/bareos-devel/b17e7a36-4ea6-ddfe-39ca-c9fa76b1ea74%40bareos.com.

Reply via email to