Am 02.11.22 um 16:00 schrieb Burkhard Linke:
Hi,
we are running bareos with disk based storage (maybe adding tapes later
for long term archives). It currently hosts ~220 million files with
roughly 900TB data in about 150 jobs. Job size range from a few
files/bytes to several hundred TB. We are using a standard textbook
always incremental scheme, with extra long retention file for the
initial full backup.
Example:
Job {
...
Always Incremental = true
Always Incremental Job Retention = 6 months
Always Incremental Keep Number = 60
Always Incremental Max Full Age = 5 years
...
}
This works fine, but the virtual full jobs triggered by the
consolidation job need to be optimized.
When I reverse engineer your settings, it means:
- I want to keep every Incrementals that was made in the past 6 months
- I want to keep at least 60 of these Incrementals
- I want to keep a full backup around that isn't older than 5 years
Assuming that you're doing daily backups, you'll end up with:
- 6 * 30 = 180 Incrementals for the last 180 days
- 1 Full that is on average 2.75 years old
As long as your Full isn't older than 5 years, consolidation will take
the oldest Incremental, merge it with the second oldest Incremental into
a new Incremental (which is now considered the oldest one).
When your Full is 5 years old, consolidation will then take the Full,
the oldest Incremental and the second oldest Incremental and merge them
into a new Full.
So in your setup, the oldest Incremental will grow for 4.5 years until
it gets merged into your Full. During that period it will get bigger and
bigger, which will make the consolidation take longer and longer.
Long story short: you can probably save a lot of time moving data around
if you decrease AI Max Full Age to maybe 9 months or so, effectively
producing a new Full every 3 months and keeping the daily consolidation
a lot smaller.
In the current implementation (correct me if I'm wrong), the virtual
full job reads all data from the jobs to be consolidated, and stores
them in the full pool. With the configuration above, this is fine for
the first run of a virtual full job. It will processes two incremental
runs, and stores their content (minus overwritten files / deleted files)
in the 'AI-Consolidated' pool. On the next run, it will read the data
from the previous virtual full run (in pool 'AI-Consolidated') + data
from the next incremental run (in pool 'AI-Incremental') and store it in
the 'AI-Consolidated pool'. So data already stored in the correct pool
is read and written again. This is fine for tape based backups, but in
case of disk based backups data is copied unnecessary. For large jobs
(think 100-200 TB) it might even become unfeasible since virtual full
run will take days/weeks.
I agree that AI (and Virtual Full in general) is pretty i/o heavy for
the SD. However, that's simply how it works right now.
If you need to consolidate large jobs (as you said 100-200 TB), you'll
need unreasonably fast storage (i.e. a lot more than 1 GB/s) to finish
within a day.
The only workaround is to cut these jobs into pieces and configure Max
Full Consolidations to spread the consolidation into a new full backup
across several days.
I don't know the details of the volume header format, but it should be
possible to implement the following method:
for each file:
1. if source and target pool are different, use standard copy method
2. if source and target pool are not disk based, use standard copy method
3. update header in existing volume(s) to reflect changes (e.g.
different job id)
4. update database to reflect changes
5. in case of pruned files, truncate corresponding chunk in the volume(s)
It might be tricky to ensure atomicity of steps 3 + 4 to avoid
inconsistencies. Most filesystems should be able to handle sparse files
correctly, an extra "defragmentation" steps seems to be unnecessary.
Any comments on this? Are there any obvious showstoppers?
To make that work we would need to
1. Add a new job to the catalog that references the ranges of the jobs
to be consolidated
2. Change all the block headers in existing volumes so that they belong
to the consolidated job
3. Change all the record headers so they have file ids that are strictly
increasing (from the new job's point of view)
4. Mark records that are no longer needed in some way
5. Rewrite the first SOS record, remove all other SOS and all but the
last EOS record and overwrite the last EOS record.
6. Remove all blocks that consist only of records that are no longer
needed (and make sure the SD and all tools can read volumes with nulled
blocks in them)
7. Remove the original jobs from the catalog
8. Provide a 100% failsafe way to resume operations 2 to 6, otherwise a
crash during that operation would leave all data in the job unreadable.
Sounds like quite an agenda. With the current on-disk format, I wouldn't
dare trying it. There's just too much that can go wrong in the process.
We're planning to introduce another file-based storage backend with a
different on-disk format next year. That would theoretically allow to do
virtual full backups with zero-copy for the payload (i.e. it would still
read and write the block and record headers, but wouldn't copy the
payload anymore).
However, that backend is still vaporware today and zero-copy has not
even made it to the agenda yet.
Best Regards,
Andreas
--
Andreas Rogge andreas.ro...@bareos.com
Bareos GmbH & Co. KG Phone: +49 221-630693-86
http://www.bareos.com
Sitz der Gesellschaft: Köln | Amtsgericht Köln: HRA 29646
Komplementär: Bareos Verwaltungs-GmbH
Geschäftsführer: S. Dühr, M. Außendorf, J. Steffens, Philipp Storz
--
You received this message because you are subscribed to the Google Groups
"bareos-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to bareos-devel+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/bareos-devel/b17e7a36-4ea6-ddfe-39ca-c9fa76b1ea74%40bareos.com.