Hi,
Scott wrote on 02.05.2007 at 11:08:01 [Re: [BackupPC-users] BackupPC and OS X]:
> On May 2, 2007, at 9:02 AM, Holger Parplies wrote:
> > Due to the implementation of pooling your second full backup may be
> > much faster than the first: [...]
>
> This sounds like it just skewed the results of my ssh -c blowfish
> full backup test.
> [...]
> My backup speeds over GigE this time were 5.42MB/ compaired with my
> previous full backup which was 3.10MB/s. While -c blowfish could have
> made a difference then again so could have the pool!
well, you know that neither effect will make more difference than both
together :). Is the speed acceptable to you?
> "Pool hashing gives 45 repeated files with the longest chain of 10."
> I have no idea yet if these are considered normal. Figuring out what
> pool-chains are is on a post-it note here.
Pool chains are files with the same hash. As only part of a file is used to
calculate the hash, it is easy to imagine a file with the same hash as
another file (of more than 256KB length ;-) but different content: simply
change the part of the content not used for hash calculation (but keep the
length the same as that goes into the hash too).
See the documentation ("The hashing function") for a detailed description of
the hashing function.
When BackupPC wants to match an incoming file to a pool file, it calculates
the hash. In your case, there are 45 hashes where more than a single file in
the pool might match, in the worst case there are 10 candidates. BackupPC
now needs to compare the incoming data with each of the candidates until the
contents either fully match or a difference is found (best case: one candidate
is identical and the others differ as soon as possible, worst case: all
candidates have a different last byte than the new file and are otherwise
identical; they'll need to be > 1MB for the hashes to be identical in this
case). This is further complicated by the need to decompress the pool files
on the fly.
The important thing is that you probably have several 10000 files in your
pool, so a pool chain > 1 file is quite uncommon.
> Before doing any otther tests I think I need to figure out how to remove
> backups and clean out the pools.
As you don't have *anything* in the pool you want to keep, you can probably
delete the complete cpool and pc/$host directories and re-create them with
the same permissions. The subdirectories of cpool will be created as needed,
the empty pc/$host/backups file will probably too :).
Thinking about it, your first backup, run on an empty pool, won't put
anything in the pool and thus do no pooling at all. Every single file will
need to be compressed. After the backup, BackupPC_link will be run, in order
to fill the pool and turn duplicate files into links to a common pool file.
In that, your first backup is quite unique. You'll probably want to compare
the more common case of a second backup, where pooling takes place during
backup (except for new files, which are again added by BackupPC_link).
Regards,
Holger
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
BackupPC-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/backuppc-users
http://backuppc.sourceforge.net/