[DNG] Long-term archiving versus medium fallibility

Hendrik Boom Mon, 01 Nov 2021 05:12:09 -0700

These days an increasing amount of my personal information, bookd, 
mementos, family photos, and work data are being kept on digital media.  
And those are vulnerable.


It's well-known that to archive files long-term (say, ten years or more) 
it is necessary to keep multiple copies, preferably on different media.

So this is what I'd like to do with my critical files.

Yet these files are also working files, are kept online, and 
legitinately need to be modified from time to time.

So I keep backups.  Currently I use rdiff-backup, which does have the
ablity to keep older as well as newer versions of files on the same 
backup drive.  And I keep multple backups.

(This might even help somewhat against ransomware attacks)

--

Now storage media deteriorate over time.
It is necessary to read and transfer data from old media to new from 
time to time.  Yes, I know that.  My present method is to keep 
everything on my server, and make regular backups.

(OK, my backuos aren't all *that* regular, but I try)

Now the master copy is the working file system of my server.
And even in the absence of ransomware, there are occasional disk 
failures.

Yes, I use a RAID so any detected disk failures don't cause immediate 
data loss.  (it also has the side effect of letting me continue running 
apparently unaffected from the time a disk has failed until I manage to 
replace it.

And I also use the ext4 file system against unexpected shutdowns.  Yes, 
I journal everything, not just metadata.  So after a crash, or 
unexpected power outage, the file system is easy to restore to a 
consistent state.

Now for further protection against data failure, I'd like to introduce 
checksumming.  This is available with btrfs and zfs (or is it xfs?  I 
forget which is which).

But ... all of this relies on valid RAM.

Copying files to or from backup, updating files, all of it is done by 
copying into RAM and then copying it from RAM.  In the presence of 
faulty RAM, even a backup copy could be seriously damaged.

And this is worse with the newer b-tree file systems, which are 
constantly copying data. - even data which hasn't changed.  A single 
update will read a large block of data from disk, make the changes, and 
write it back.  The entire block is this written back, complete with 
changes, bit-failures from RAM problems, and a new check-sum to validate 
the bad bits.

I'm told the maintainers of thse file-systems laugh at you if you're not 
using ECC memory.

---

Now I'm wondering how to introduce chack-summing to protect against this 
kind of data loss despite occasionally (but rarely) filing memory.

* I could run memory checks frequently to catch failing memory.  But the 
circumstances in ordinary operation differ from the circumstances of the 
memory check program, and faulty memory might fail to be detected.

* I could install ECC memory.  But that has become difficlt to get, and 
some processors on the mass market won't even handle it properly.

* I could hope the ext4 developers will add checksums to the ext4 file 
system, possibly renaming it to ext5.

* Or I could try doing my own checksuming.  Perhaps checksumming 
everythin in the file system and catching files whose checksums have 
changed without a new modification date.  This could be done at backup 
time, flaggin such discrepancies for manual attention.  (note: need to 
check the checksum on the backup, too).

---

Anyone have other ideas?

-- hendrik 




_______________________________________________
Dng mailing list
[email protected]
https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng

[DNG] Long-term archiving versus medium fallibility

Reply via email to