Re: Rebuild after disk fail

Russell Coker via luv-main Thu, 30 Jan 2020 16:48:17 -0800

On Thursday, 30 January 2020 6:05:56 PM AEDT Craig Sanders via luv-main wrote:
> > It really depends on the type of data.
> 
> No, it really doesn't.
> 
> > Backing up VM images via rsync is slow because they always have relatively
> > small changes in the middle of large files.
> 
> rsyncing **ANY** large set of data is slow, whether it's huge files like VM
> images or millions of small files (e.g. on a mail server).


Here's what I wrote previously:
# It really depends on the type of data.  Backing up VM images via rsync is
# slow because they always have relatively small changes in the middle of
# large files.  Backing up large mail spools can be slow as there's a
# significant number of accounts with no real changes as well as a good number
# of accounts with only small changes (like the power users who have 10,000+
# old messages stored and only a few new messages at any time because they
# delete most mail soon after it arrives).  But even for those corner cases
# rsync will work if your data volume isn't too big.  For other cases it works
# pretty well.

I've used rsync to backup mail spools with up to about 20,000 accounts.  Not 
big mail stores and only doing a backup twice a week.  The regular backups 
(for users deleting the wrong messages) were ZFS snapshots.

> rsync has to check at least the file sizes and timestamps, and then the
> block checksums on every run. On large sets, this WILL take many hours, no
> matter how much or how little has actually changed.

It's all a matter of scale.

I just did a test on a workstation with about 100G of storage in BTRFS.  The 
usual backups are weekly on Sunday night.  A run now took 28 minutes (copying 
5 days of data).  A run immediately after (just rsync checking file dates) took 
65 seconds.  I could set that machine to have a backup every hour over the 
Internet if I wanted to.

> (a minor benefit of this is that if a file or directory is moved to another
> directory in the same dataset, the only blocks that actually changed were
> the blocks containing the directory info, so they're the only blocks that
> need be sent. rsync, however, would send the entire directory contents

Yes, that's good for that case.  Not a common case I deal with.

> because it's all "new" data. Transparent compression also helps 'zfs send'
> - compressed data requires fewer blocks to storer it....rsync, though,
> can't benefit from transparent compression as it has to compare the source
> file's *uncompressed* data with the target copy)

Rsync compares the checksums of the uncompressed data.  Then sends compressed 
data if you use the -z option, and if you have ssh configured to use 
compression then that applies too.

> rsync is still useful as a tool for moving/copying data from one location to
> another (whether on the same machine or to a different machine), but it's
> no longer a good choice for backups. it just takes too long - by the time
> it has finished, the source data will have changed.  It's an improved "cp".

That depends on what you are backing up.

Rsync is a well known program, it doesn't require any special setup or 
testing.  The BTRFS and ZFS programs for sending changes would require more 
testing.

> I prefer to use the filesystem that's best for all machines on the network.
> 
> If ZFS is in use on the file-server or backup-server, then that means zfs
> on everything else. If it's btrfs on the server, then it should be btrfs on
> everything.

Except if you have some systems storing large data that needs RAID-Z and some 
systems that need the flexibility that BTRFS offers.

> btrfs is not an option here because it just isn't as good as zfs...if i'm

Unless you want to have a RAID-1 array that can have disks added to it or 
removed from it at any time and of any size.  This is a useful feature for a 
home server and something ZFS doesn't support.

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/



_______________________________________________
luv-main mailing list
[email protected]
https://lists.luv.asn.au/cgi-bin/mailman/listinfo/luv-main

Re: Rebuild after disk fail

Reply via email to