[BackupPC-users] rsync vs. BackupPC_tarPCCopy (was: [BackupPC-devel] BackupPC_tarPCCopy hard link error)

Holger Parplies Sat, 20 Oct 2007 20:41:20 -0700

[Replying to <backuppc-users> as this is clearly a usage question]

Hi,


Rob Terhaar wrote on 19.10.2007 at 14:42:17 [Re: [BackupPC-devel] 
BackupPC_tarPCCopy hard link error]:
> Perhaps this has been discussed, but how does backuppc_tarPCCopy compare to
> using rsync (with hardlinks enabled)
> 
> Lets say i'm backing up 50 computers, and i want to get them off site (~3tb
> of data)
> 
> Which is the best way to do this? backupPC_tar_PCCopy or rsync?

it basically depends on the file count. Likely, you've go so many files that
rsync will simply not work without an insane amount of RAM (and I *mean*
RAM, not swap space!) that you probably don't have. That would make
BackupPC_tarPCCopy the best choice :-).

If you've only got 20 huge files (with 50 computers that would mean that a
lot of them have identical content :) and 100 backup trees for each computer,
then rsync is the better choice.

BackupPC_tarPCCopy needs to calculate BackupPC hashes from file content,
entailing decompression of the full file to get the file size, I believe.
That is a large amount of work. The point is: it works, regardless of pool
size. It can (theoretically) be split up into arbitrarily small parts like
one host, one backup, one share or even one file at a time. I would be wary
about spreading it out over time if your source or destination pool is in
use though, as a pool file with a certain hash value may expire from one
pool and be replaced with a new file with the same hash but different content
while remaining unchanged in the other pool. BackupPC_tarPCCopy (or rather
the extracting tar instance) has no way of knowing that the hardlink is
intended to refer to different content.

BackupPC_tarPCCopy by default caches inodes, meaning it will only calculate
the hash once for each set of files pointing to a common pool file in one
run. That means: if you do too many files in one run, you'll run out of
memory just like with rsync, and if you do too few, you'll be re-computing
hashes unnecessarily.

rsync on the other hand just needs to compare inode numbers. Because you
can't say "create a hardlink to inode i", it needs to store the file names
(i.e. paths!) corresponding to the first occurrence of each inode (*). That is
a lot of memory for many million inodes as you're likely to have (guessing
average file sizes are probably well under 1 MB), and it only works if you
consider all files in one run.

I had considered recommending an rsync of the pool and a subset of the pc
directories to speed things up and then using BackupPC_tarPCCopy for the
rest, but that is pointless. All file inodes are guaranteed to be present in
the (c)pool directory (well, at least they should be ;-), so I'd guess rsync
would need about as much memory as for the complete structure with all pc
directories.


To sum it up: if rsync works, it will be faster. If it doesn't, then
BackupPC_tarPCCopy is your only option (for now; if you're not in a hurry,
that *might* change soon, maybe, perhaps).

Regards,
Holger

(*) I didn't check if rsync really operates that way, but it seems
    reasonable to do it like that.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
BackupPC-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/backuppc-users
http://backuppc.sourceforge.net/

[BackupPC-users] rsync vs. BackupPC_tarPCCopy (was: [BackupPC-devel] BackupPC_tarPCCopy hard link error)

Reply via email to