Hi,

Jason M. Kusar wrote on 30.04.2007 at 23:26:50 [Re: [BackupPC-users] Filling 
Backups]:
> Holger Parplies wrote:
> > The next consideration with a network based backup solution (vs. backup to
> > local tape media) would be network bandwidth. With the rsync transport, the
                                                  ^^^^^^^^^^^^^^^^^^^^^^^^
> > cost for a full backup is again virtually the same as for an incremental
> > (only some block checksums more) - at least in terms of used network
> > bandwidth.
> > [...]
> So, just to make sure I'm absolutely clear, doing a full backup doesn't
> need to transfer all the unchanged files because they still exist in the
> pool, correct?

that is correct *only for rsync type transport* (meaning rsync or rsyncd).
Actually not "because they exist in the pool", but rather "because they were
the same in the host's previous backup". If you rename a file, it will be
transfered but not written to disk on the BackupPC server (not even
temporarily!), but rather linked to the identical pool file.

> But how does RsyncP tell that the file is there without 
> actually transferring to the system to perform the checksum?

The same way native 'rsync' does. I'm not sure about the details, but
basically both sides compute block checksums of their respective versions of
the file in question. These are transfered over the net and compared. Blocks
that don't match are then sent to the receiver (BackupPC side), who in turn
reconstructs the file from the parts he already has and the new parts. At
some point, a checksum over the complete file is calculated (again on both
sides) and compared to make sure the files are identical.
For an unchanged file, that means that only the checksums are transfered.

> You mentioned something about using rsync with checksum caching, but I 
> couldn't find any options for it.

The BackupPC side does not store the files in plain native format - at least
for a compressed pool. Compression both makes checksum calculation more
expensive and allows for storing the checksums in the compressed file so they
will not need to be recomputed. Checksum caching is enabled by adding the
--checksum-seed=32761 option to $Conf {RsyncArgs}. See config.pl, comment
before $Conf{RsyncCsumCacheVerifyProb} or in $Conf{RsyncArgs}.
I'm not sure whether checksum caching works with uncompressed pools.

> > [...]
> >   1. Split up your backup
> > [...]
> Unfortunately, this wouldn't work very well without major 
> re-organization of my data.  I'd like to avoid this if possible.

I can understand that. Still, if it worked out somehow, it would probably
make the most difference.
Splitting up does not necessarily mean you need to have something like

        /share/static
        /share/dynamic

(i.e. distinct (groups of) toplevel directories). If you can, for instance,
divide by file name (eg. everything named *.iso or *.ppt or *.jpg doesn't
change, everything else does), it would be possible to exclude these in one
case and include only them in the other. Working out the rsync options might
take a moment, but it would probably be worth the effort. You just need to
make sure you don't accidentally leave something out :).

> >   2. Use multilevel incrementals to get a balance between retransmitting
> >      data and merging views
> >   
> I'll play around with this.  Files are only rarely deleted or moved 
> (though it does happen occasionally), so infrequent fulls should be ok.

Incrementals should catch those. It's the rare case that a file's contents
change without the (relevant) metadata changing that is missed.
And it's the growing amount of transfered data and/or merging overhead that
may prove to be a problem (or not, depending on the characteristics of the
changes).

> [...]
> The servers sit on a gigabit network, but the backup server cpu sits
> at 100% while the backup is going on.

That should probably improve (and thus backup performance) with checksum
caching - starting from the third backup - because decompression and
checksum computing on the BackupPC server are no longer performed.

> The server I'm backing up is significantly faster and only uses about 15%.
> The backup server as a gig of memory.  The backed up server as 2 gigs.

As you're already using rsync for that setup, memory seems to be sufficient
for the file lists (or are you experiencing thrashing? That would explain
extremely long backup times ...).

Quoting Craig Barratt:

    Backup speed is affected by many factors, including client speed
    (cpu and disk), server speed (cpu and disk) and network throughput
    and latency.
 
    With a really slow network, the client and server speed won't matter
    as much.  With a fast network and fast client, the server speed will
    dominate.  With a fast network and fast server, the client speed will
    dominate.  In most cases it will be a combination of all three.

If you're using rsync over ssh, changing the cipher to blowfish might make a
difference, because it would reduce the amount of computing your BackupPC
server has to do.

Regards,
Holger

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
BackupPC-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/backuppc-users
http://backuppc.sourceforge.net/

Reply via email to