Radosław, thank you for your reply! Useful information.

Regarding virtual full backups, because I am backing up via my single SAS
LTO8 drive, and because making a copy job requires an input device and an
output device, the information you shared leads me to expect that I am
unable to do virtual full backups with my Bacula community edition unless I
either upgrade to bacula enterprise edition or purchase another LTO8 drive.
My tape changer does have room for such a drive, but the expense would have
to be justified to my client.

I agree with you about manually purging volumes being dangerous and really
not advisable at all. The problem I'm facing is that I don't see a better
way to meet my client's needs otherwise.

Basically, I have two classes of data. The first (and easiest to handle)
consists of project files and minor media files that the video editors use
in their day to day work. These files are created, changed, and deleted all
the time, and the total dataset is around 2TB. So, very manageable. I have
a nightly backup of these files running on an automated schedule. I think I
will discuss with the client and set a retention period around one year.

The second dataset is the root of my issues here. This dataset consists of
the video files captured by film crews on location. These files are filmed
in 8k, and are very large. Dataset size is around 130TB, and is only
expected to grow. The raw video files should be expected to NEVER change.
Any edits are not applied to the original raws. Effectively, if bit rot or
any edit has changed these files, then this is to be regarded as
corruption. There are smaller downscaled versions of these files (called
"proxies") that do change periodically. The proxies aren't as valuable or
important as the original raw files, as they can always be regenerated from
the raws.

My goal regarding the media share is to back it up manually only, as media
is added. Media will only be added when film crews have gone onsite and
filmed new media, or when an editor has generated or regenerated proxies
for that media. Because system performance is so important to the video
editors, they don't want any backup action to impact their access speeds
when working on the raws. Additionally, none of us want a partially
completed file to be saved to LTO. As such, backups to this media share
will be manually ran, as and when media is added or changes made. Because
the proxies are stored right next to the raws, in the same share and folder
tree, I don't have a good idea so far for detecting comparatively small
changes and backing those up automatically. This leaves me with manual
backup for this large dataset.

My largest concern is that I don't want to lose the original backup of
these essential raw files. I presume that the original backup was correct,
and any subsequent re-backup of that original could include damage from bit
rot or an unwanted change. My incremental backups will capture any wanted
changes, but I don't want the original backup to be eliminated as I reuse
tape. So, I plan to re-use tape very infrequently. I have been thinking
about this problem and I think I might be able to confirm that data on the
media share is valid and not corrupt by a combination of querying the
bacula catalog for any file in the media share which has been backed up
twice (modification date/size changed, so this was a valid filesystem
operation), and then using a catalog to disk verify operation to hash files
presently on disk and determine if any of them have changed due to bit rot.
The only problem is that some raw files may have been deleted by video
editors for valid reasons, but tracking such would be very difficult for
purposes of validating if a deleted file was really unwanted or is missing.
Reusing my tapes would then eliminate backups of the missing file.

My client's stated goal for data retention of this media is 7 years.

I think I could set the retention period for volumes to be around 7 years.
I don't know if this period is doable between full backups. I imagine I'm
going to have to adjust my expectations as I encounter more situations and
learn. Meeting my customer's goals with bacula might involve buying a lot
of tape, or mixing analysis of data on disk with analysis of catalog data
to verify if discarding the old full backup would be safe.

For reference, a full backup of this dataset takes 2-3 weeks at the average
tape write rate of about 170MB/s. For further reference, acquisition of the
raw files is expensive, requiring that professionals be paid to go to
location and capture footage. This places a value on the data, not counting
the cost if we disappoint my client's customer by losing the data for the
shoots they arranged.

Regards,
Robert Gerber
402-237-8692
r...@craeon.net


On Sat, Jan 27, 2024 at 7:45 AM Radosław Korzeniewski <
rados...@korzeniewski.net> wrote:

> Hello,
>
> wt., 7 lis 2023 o 21:57 Rob Gerber <r...@craeon.net> napisał(a):
>
>> To update this thread, I ultimately was able to avoid a bscan of all my
>> backed up media by restoring a backup of my catalog database. I have set
>> file, job, and volume retention to 1000 years.
>>
>
> Volume retention for 1000Y means you will never recycle your backups, so
> it will occupy all your available storage in the long term and you will be
> forced to manually delete oldest unused backups.
> IMVHO, doing manual backup recycle on will, is the last clever idea anyone
> can get. It is always better to set a real and usable volume retention and
> let the Bacula (enterprise class backup software) do this job for you.
>
>
>> Does a virtual full job strategy eliminate information about changed
>> files? ie, if the full backup captured fileA in one state, and if a later
>> incremental backup captured fileA in a different state, would the virtual
>> full consolidation process eliminate reference to the first backup of fileA?
>>
>
> Bacula Community does not maintain backups as data references, but always
> does a data copy on virtual full. You have to use Bacula Enterprise GED
> feature to get data references on virtual full.
> So, in your question virtual full consolidation process will use fileA
> state from incremental backup level and will copy it to the new virtual
> full backup job.
>
>
>> Lets assume that once a tape is full it, nor its associated files or jobs
>> will never be recycled, at least not for a 7 year period or so.
>>
>
> It doesn't matter. Virtual full will be a fully independent backup copy
> from all its ingredients. It will occupy a new backup space.
>
>
>> Incrementals forever could scale very badly in a larger enterprise,
>>
>
> Did you check BackupsToKeep functionality in Bacula? I have a different
> opinion on this matter.
>
>
>> but my objective is to protect a single set of files on a single system.
>> My largest concern is tapes going missing in an incremental chain, and for
>> that reason I'm probably going to need to do differential backups
>> periodically.
>>
>
> If you want to maintain a few hundred thousand incrementals chain because
> you've made a single full a 1000Y ago and did incrementals only, then this
> policy won't work by design.
> Incrementals forever means your backup client won't need to execute full
> backup (despite a first one) any more. It doesn't mean your backup chain
> only grows.
> It means backup software consolidates automatically full+incrementals
> chain for oldest backups or creates a full level from data already saved.
> Bacula uses a virtual full backup level for this. In this case you can set
> up a time base consolidation, i.e. once a week or number based
> consolidation, i.e. number of remaining incrementals.
>
> or I totally misunderstood your concerns here.
>
> R.
> --
> Radosław Korzeniewski
> rados...@korzeniewski.net
>
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to