On Fri, Jul 11, 2025 at 09:55:23AM -0700, Kevin Williams wrote:
> I certainly want to detect a failing drive for my source data and replace it
> before corrupted data is backed up from it.

No matter what techniques you use, this is more difficult than it might seem.

Copying data from one place to another means relying on the firmware on the
source device, the interconnection to the CPU, (E.G. sata cable, scsi cable,
whatever), the OS device driver for the source device, filesystem and buffer
cache management code, the stability of the system RAM whilst the data is in
the buffer cache, and most of the same things on the way to the target device.

Copying around a few Mb of data you might not notice problems even on a flakey
system.  Once you get in to the Tb range or a fair fraction of a petabyte,
unexpected errors are more likely to show up even on seemingly perfect
hardware.

Different techniques save you from different failure scenarios.

A complicated filesystem that does it all for you behind the scenes might seem
great at first, but if and when it does eventually come crashing down, your
options to do any recovery are going to be restricted by the complexity of the
thing in the first place.

Sure, you should have backups.  But there are times when you have 99% of your
data backed up, but still want to recover that one file that was critically
modified just after the last time it hit the tape.

This is the counter-argument to those who insist that media longevity doesn't
matter, because finding a drive to read it back in 25 years time will be
impossible, (which in many cases is dubious anyway).

Migrating your data from one place to another repeatedly could easily
introduce bit errors that go un-noticed.  Don't assume that the drive's own
CRC checking will catch it.

> Does anyone have examples of such shell scripts, such as with chsum(1) or 
> md5(1)?

I posted one to -misc a couple of years ago.  Here is a slightly updated
version.

if [ "$1" == "i" ] ; then touch checksums ; fi
for i in `find . -name checksums` ;
do (
if [ "$1" == "a" ] ; then echo -n "Not v" ; else echo -n "V" ; fi
echo "erifying checksums in directory ${i%/checksums}";
cd ${i%/checksums};
if [ "$1" != "a" ] ; then sha512 -cq checksums; fi
let flag=0;
for j in !(checksums|checksums.bak) ;
do
if [ ! -d "$j" ] ; then grep -F "($j)" checksums > /dev/null || { if [ -z "$1" 
] ; then echo "$j is not in the checksums file!" ; let flag=1 ; else echo 
"Adding $j to checksums file" ; sha512 "$j" >> checksums ; fi ; } fi ;
done ;
if [ $flag -eq 1 ] ; then echo "Run $0 with any command line arguments to add 
missing entries to the checksums file."; else echo "All files have entries in 
the checksum file."; fi ;
 );
done
if [ "$1" == "i" -a ! -s checksums ] ; then rm -f checksums ; fi

This is an _example_ to get you started writing your own script.

The whole idea is to make something that suits your needs.  Don't just copy
and paste this one without tweaking it for yourself.

> Do you trust them with giant files of several gigabytes or even 
> terabyte-sized VM
> or database files?

Yes.  Sha512 hashes have kept many Tb of data intact for me, and detected
various random bit errors from time to time.

> If cryptographic hashes would be better, would features of LibreSSL, OpenSSH, 
> or
> another OpenBSD base system tool be suitable for a fileserver/NAS?

Sha256 is easily enough to detect random data corruption.

> Looking at the manpages, I don't think softraid(4) or bioctl(8) can contribute
> to repairing or replacing an individual corrupted file with a known good 
> copy, or
> that ffs has a copies-equals-two option, except for the superblock described 
> in
> fs(5) (search for the word 'copies').

Again, scripts.  On my main workstation, I back up $HOME to another hard disk
in the same machine several times a day.  Whenever I finish a large edit, new
version of a patch, or just go to get coffee, I invoke the backup script.

15 seconds later it's done, and automatically deleted the oldest backup.  A
separate script restores the most recent one to a freshly mounted ramdisk.
Then I just cp over whichever file I accidently screwed up.

If I invoke the backup script with an argument, it backs up to an external SSD
identified by it's DUID.  Once a day before I go home.

(Of course, we have a propper backup strategy in place as well, this is just
 my personal way to get back up and running faster.)

> If the periodic checksum script discovers an error, what are options to 
> correct
> it on the same system without restoring from backup?

It depends on the rest of the setup.

> But I want to consider OpenBSD for my NAS and see how others such as Brian 
> have
> succeeded at it.

There is a danger of over-thinking things here.

Just about any system will occasionally read a bad bit from disk, whatever
anybody tells you.  What matters is that the application detects that and
doesn't process bad data as good.

For a home data storage setup, just use a regular disk and regular backups.

If you need good uptime, (E.G. a music server for a radio station), then use
a raid-1 mirror.

If your data is valuable, (a book you spent 5 years writing), then make
multiple backups, store them properly, and keep copies of the sha512 hashes
in multiple places so that you know when you read the backup if it's good or
not.

Reply via email to