On Sat, Jul 13, 2013 at 5:13 PM, Tomasz Chmielewski <man...@wpkg.org> wrote:

>
> --checksum is quite paranoid and causes substantial IO on both sides.


> Basically, it would cause md5sum calculated for each and every file, on
> both sides.
> If the archive is dozens of gigabytes or more - it means:
>
> - extra CPU used,
>
> - extra IO used,
>

No, that is not true.  On the server side in 4.x, so long as that file was
previously backed up with the same path for that client, the full-file
md5sum is stored in the file attributes, so it takes no more effort to
check the md5sum that comparing other attributes like the mtime and size.
 (Of course, if the file isn't in the pool already, the server needs to
compress and write it to the pool, which is an expensive operation.)

On the client side you are correct - the entire file needs to be read.  But
isn't that the point of a full backup?  The client-side load is similar to
version 3.x, where a full backup requires the client to also read the
entire contents of every file.

There are two (typically one-time) cases where the client might do a bit
more work compared to 3.x:

   - If the file isn't in the pool at all, the server will send empty block
   checksums (ie: nothing to match), and the client will then send the whole
   file verbatim.  That requires two reads of the file on the client (first to
   get the md5sum, and second to send the file).  But these are close in time
   and the second read will probably be cached.
   - If the file is anywhere in the pool (even if it's a first backup of a
   brand new client), the right pool file will be a likely match (based on
   md5sum), and block digests will be sent to the client.  If it is a match,
   very little network traffic is needed, but the client needs to re-read the
   whole file to be sure.  So that also requires two reads of the file on the
   client, and one on the server.  Again, the two client reads are close in
   time and the second read will probably be cached.

- since we have to read all the files, anything the server had in
>   cache/buffers, will be purged from there (newly read files go to
>   cache/buffers instead, but it's not very useful there, since the
>   backup will be most likely made daily or less often).


That's a good point.  Igor Sverkos recently pointed out the fadvise patch
to rsync that gives hints via the posix_fadvise call whether to cache
certain files or not.  That patch significantly reduces the impact you
describe.  However, in the steady state, rsync_bpc on the 4.x server isn't
reading entire files very often.  I think the patch is likely to help the
client side more than the server, but I haven't tried it.

I'm pretty sure that my clients don't secretly change the file content,
> while preserving their size and timestamp.
> In that case - will BackupPC still work correctly if I change the
> default:
>
> $Conf{RsyncFullArgsExtra} = [
>             '--checksum',
> ];
>
>
> to:
>
> $Conf{RsyncFullArgsExtra} = [
>             '',
> ];


If you are comfortable with mtime, size etc catching all changes to files,
then rather than disabling --checksum in $Conf{RsyncFullArgsExtra}, I
recommend you only ever do incremental backups.  4.x supports that.  Set
$Conf{FullPeriod} to a very large value.

If you are doing incremental-only, you should set $Conf{FillCycle} to, say,
7.  This will make sure every 7th backup is filled.  The "Full" backup
delete settings (eg: "FullKeepCnt") actually mean "Filled" backups in 4.x,
so they control how many of those filled backups to keep, including the
exponential expiry option.

Note that, by default, $Conf{FillCycle} is 0, which keeps fulls filled, and
incrementals not filled (except for the most recent backup which is always
filled), so the delete settings work as you would expect.  I grappled with
whether I should rename "FullKeepCnt" etc to "FillKeepCnt", but I thought
that would be more confusing for existing users (and for old 3.x backups,
FillKeepCnt would really still mean FullKeepCnt -- ugh!).  Plus it's risky
to change all the per-PC client configs to rename the variables.
 Suggestions are welcome.

Craig
------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
BackupPC-devel mailing list
BackupPC-devel@lists.sourceforge.net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-devel
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Reply via email to