[Replying to <backuppc-users> as this is clearly a usage question]
Hi,
Rob Terhaar wrote on 19.10.2007 at 14:42:17 [Re: [BackupPC-devel]
BackupPC_tarPCCopy hard link error]:
> Perhaps this has been discussed, but how does backuppc_tarPCCopy compare to
> using rsync (with hardlinks enabled)
>
> Lets say i'm backing up 50 computers, and i want to get them off site (~3tb
> of data)
>
> Which is the best way to do this? backupPC_tar_PCCopy or rsync?
it basically depends on the file count. Likely, you've go so many files that
rsync will simply not work without an insane amount of RAM (and I *mean*
RAM, not swap space!) that you probably don't have. That would make
BackupPC_tarPCCopy the best choice :-).
If you've only got 20 huge files (with 50 computers that would mean that a
lot of them have identical content :) and 100 backup trees for each computer,
then rsync is the better choice.
BackupPC_tarPCCopy needs to calculate BackupPC hashes from file content,
entailing decompression of the full file to get the file size, I believe.
That is a large amount of work. The point is: it works, regardless of pool
size. It can (theoretically) be split up into arbitrarily small parts like
one host, one backup, one share or even one file at a time. I would be wary
about spreading it out over time if your source or destination pool is in
use though, as a pool file with a certain hash value may expire from one
pool and be replaced with a new file with the same hash but different content
while remaining unchanged in the other pool. BackupPC_tarPCCopy (or rather
the extracting tar instance) has no way of knowing that the hardlink is
intended to refer to different content.
BackupPC_tarPCCopy by default caches inodes, meaning it will only calculate
the hash once for each set of files pointing to a common pool file in one
run. That means: if you do too many files in one run, you'll run out of
memory just like with rsync, and if you do too few, you'll be re-computing
hashes unnecessarily.
rsync on the other hand just needs to compare inode numbers. Because you
can't say "create a hardlink to inode i", it needs to store the file names
(i.e. paths!) corresponding to the first occurrence of each inode (*). That is
a lot of memory for many million inodes as you're likely to have (guessing
average file sizes are probably well under 1 MB), and it only works if you
consider all files in one run.
I had considered recommending an rsync of the pool and a subset of the pc
directories to speed things up and then using BackupPC_tarPCCopy for the
rest, but that is pointless. All file inodes are guaranteed to be present in
the (c)pool directory (well, at least they should be ;-), so I'd guess rsync
would need about as much memory as for the complete structure with all pc
directories.
To sum it up: if rsync works, it will be faster. If it doesn't, then
BackupPC_tarPCCopy is your only option (for now; if you're not in a hurry,
that *might* change soon, maybe, perhaps).
Regards,
Holger
(*) I didn't check if rsync really operates that way, but it seems
reasonable to do it like that.
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
BackupPC-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/backuppc-users
http://backuppc.sourceforge.net/