Re: [BackupPC-users] rsync question - why do we check checksums etc?

Evren Yurtesen Wed, 28 Mar 2007 13:26:25 -0800

John Pettitt wrote:
> Evren Yurtesen wrote:
>> BackupPC Manual mentions:
>>
>> ----------------------------------------------------
>> Each file is examined by generating block checksums (default 2K 
>> blocks) on the receiving side (that's the BackupPC side), sending 
>> those checksums to the client, where the remote rsync matches those 
>> checksums with the corresponding file. The matching blocks and new 
>> data is sent back, allowing the client file to be reassembled. A 
>> checksum for the entire file is sent to as an extra check the the 
>> reconstructed file is correct.
>>
>> This results in significant disk IO and computation for BackupPC: 
>> every file in a full backup, or any file with non-matching attributes 
>> in an incremental backup, needs to be uncompressed, block checksums 
>> computed and sent. Then the receiving side reassembles the file and 
>> has to verify the whole-file checksum. Even if the file is identical, 
>> prior to 2.1.0, BackupPC had to read and uncompress the file twice, 
>> once to compute the block checksums and later to verify the whole-file 
>> checksum.
>> ----------------------------------------------------
>>
>> Why is it actually necessary to do this checksum checking?
>>   
> If you turn on checksum caching (see the manual) it doesn't read every 
> file every time on the server (just a random sample to ensure that 
> nothing nasty has happened to the pool).     It also doesn't read every 
> file client side for incremental just for full backups.


Yes, I read it but it is a little bit confusing, it says:

----------------------------------------------------------------------------------------------------------------------------------
BackupPC had to read and uncompress the file twice, once to compute the block 
checksums and later to verify the whole-file checksum.

Starting in 2.1.0, BackupPC supports optional checksum caching, which means the 
block and file checksums only need to be computed once for each file. This 
results in a significant performance improvement. This only works for 
compressed pool files. It is enabled by adding
----------------------------------------------------------------------------------------------------------------------------------

First it says that it had to uncompress the file twice, then it says that 
with caching it have to do it once. Here I thought 'once at each backup session'
instead of twice, not once and never again.

So it only checks them rarely when checksum caching is enabled. Which is a good 
thing :)

>> Wouldnt it be enough to find files with non-matching attributes and 
>> back them up?
>>   
> That's what it does for incremental backups - it says so in the text you 
> quoted.
>> I think that in most cases if at least modification time is different 
>> then the file
>> should be backed up anyway, no? at least there can be situations where 
>> the lst modification time of a file is more important than it's 
>> contents even (I dont see how but it is a possibility)
>>
>>   
> The checksum in rsync is more about reducing data on the wire than it is 
> about deciding what gets copied.  If the attributes have changed it will 
> get backed up but only the data that has changed will actually get sent 
> across the wire.

Well as long as it wont be calculated at each backup session, I have nothing
against it. :)

Perhaps it could be a feature if it checksum checks could be disabled altogether
for situations where the bandwidth is cheap but cpu time is expensive?

Thanks,
Evren

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/backuppc-users
http://backuppc.sourceforge.net/

Re: [BackupPC-users] rsync question - why do we check checksums etc?

Reply via email to