On Thursday, 30 January 2020 6:05:56 PM AEDT Craig Sanders via luv-main wrote: > > It really depends on the type of data. > > No, it really doesn't. > > > Backing up VM images via rsync is slow because they always have relatively > > small changes in the middle of large files. > > rsyncing **ANY** large set of data is slow, whether it's huge files like VM > images or millions of small files (e.g. on a mail server).
Here's what I wrote previously: # It really depends on the type of data. Backing up VM images via rsync is # slow because they always have relatively small changes in the middle of # large files. Backing up large mail spools can be slow as there's a # significant number of accounts with no real changes as well as a good number # of accounts with only small changes (like the power users who have 10,000+ # old messages stored and only a few new messages at any time because they # delete most mail soon after it arrives). But even for those corner cases # rsync will work if your data volume isn't too big. For other cases it works # pretty well. I've used rsync to backup mail spools with up to about 20,000 accounts. Not big mail stores and only doing a backup twice a week. The regular backups (for users deleting the wrong messages) were ZFS snapshots. > rsync has to check at least the file sizes and timestamps, and then the > block checksums on every run. On large sets, this WILL take many hours, no > matter how much or how little has actually changed. It's all a matter of scale. I just did a test on a workstation with about 100G of storage in BTRFS. The usual backups are weekly on Sunday night. A run now took 28 minutes (copying 5 days of data). A run immediately after (just rsync checking file dates) took 65 seconds. I could set that machine to have a backup every hour over the Internet if I wanted to. > (a minor benefit of this is that if a file or directory is moved to another > directory in the same dataset, the only blocks that actually changed were > the blocks containing the directory info, so they're the only blocks that > need be sent. rsync, however, would send the entire directory contents Yes, that's good for that case. Not a common case I deal with. > because it's all "new" data. Transparent compression also helps 'zfs send' > - compressed data requires fewer blocks to storer it....rsync, though, > can't benefit from transparent compression as it has to compare the source > file's *uncompressed* data with the target copy) Rsync compares the checksums of the uncompressed data. Then sends compressed data if you use the -z option, and if you have ssh configured to use compression then that applies too. > rsync is still useful as a tool for moving/copying data from one location to > another (whether on the same machine or to a different machine), but it's > no longer a good choice for backups. it just takes too long - by the time > it has finished, the source data will have changed. It's an improved "cp". That depends on what you are backing up. Rsync is a well known program, it doesn't require any special setup or testing. The BTRFS and ZFS programs for sending changes would require more testing. > I prefer to use the filesystem that's best for all machines on the network. > > If ZFS is in use on the file-server or backup-server, then that means zfs > on everything else. If it's btrfs on the server, then it should be btrfs on > everything. Except if you have some systems storing large data that needs RAID-Z and some systems that need the flexibility that BTRFS offers. > btrfs is not an option here because it just isn't as good as zfs...if i'm Unless you want to have a RAID-1 array that can have disks added to it or removed from it at any time and of any size. This is a useful feature for a home server and something ZFS doesn't support. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/ _______________________________________________ luv-main mailing list [email protected] https://lists.luv.asn.au/cgi-bin/mailman/listinfo/luv-main
