I am using rsync with --backup --backup-dir to keep copies of files which have changed as part of an incremental backup system. However, if only the timestamp has changed, it creates a copy of the file in --backup-dir, and if thousands of large files have their timestamps changed, this can waste a lot of disk space on something which hasn't really changed.

Interestingly, if you use --checksum, rsync will not create a file in --backup-dir unless the contents are truly different, but it will fix up the timestamp on the remote end to match. This is what I want, but I just don't want to pay the performance penalty of running --checksum all the time.

Here is an example that shows the problem:

mkdir ./SRC
echo hello > ./SRC/a
echo hello > ./SRC/b
rsync -av ./SRC/ ./DEST/
touch ./SRC/*
ls -al --full-time ./SRC/ ./DEST/
# Creates copies in BACKUP, even though contents are the same
rsync -av --backup-dir=`pwd`/BACKUP/ ./SRC/ ./DEST/

After this run, the BACKUP directory will contain copies of both a and b even though neither actually changed. If you add --checksum, then it avoids creating a copy, but still syncs the timestamps correctly.

touch ./SRC/*
# Does not create any copies in BACKUP since nothing changed
rsync -av --checksum --backup-dir=`pwd`/BACKUP/ ./SRC/ ./DEST/

The problem with --checksum is that for hundreds of gigabytes of data, it can be very slow to run over every file, especially if the timestamps are mostly actually the same. But without it, the delta algorithm in rsync has already decided to make a backup copy before it realizes later that nothing has changed.

Is there a flag I can add to rsync that will tell it to only create a backup file if something actually changed, saving lots of wasted backup space?

thanks,
Wayne


--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Reply via email to