Re: [BackupPC-users] Advice on creating duplicate backup server

dan Mon, 08 Dec 2008 15:40:18 -0800

you could mess around with LVM snapshots.  I hear that you can make an LVM
snapshot and rsync that over, then restore it to the backup LVM.  I have not
tried this but have seen examples around the net.


have you tried rsync3?  it works for me. I dont quite have 3TB so I cant
really advise you on that size, Im not sure where the line is on file count
that rsync3 cant handle.

ZFS would be ideal for this but you have to make the leap to a
solaris/opensolaris kernel.  ZFS Fuse is completely non-functional for
backuppc as it will crash as soon as you starting hitting the filesystem and
the delayed write caching kicks in.  ZFS on freebsd is not mature enough and
tends to crash out with heavy IO.

with zfs it works something like this:
http://blogs.sun.com/clive/resource/zfs_repl.ksh

you can send a full zfs snapshot like
zfs send /pool/[EMAIL PROTECTED] | ssh remotehost zfs recv -v
/remotepool/remotefs
or send an incremental afterwards with
zfs send -i /pool/[EMAIL PROTECTED] | ssh remotehost zfs recv -F -v
/remotepool/remotefs

feel free to compress the ssh stream with -C if you like, but I would first
check your bandwidth usage and see if you are using the whole thing.  If
not, then the compression will slow you down.

The real downside here is the switch to solaris if you are a linux person.
You can also try nexenta which is the opensolaris kernel on a debian/ubuntu
userland complete with apt.

You also get filesystem level compression with ZFS so you dont need to
compress your pool.  This should make recovering files outside of backuppc a
little more convenient .

how is a tape taking 1-2weeks?  1 week = 5.2KB/s.  If you are that IO
constrained, nothing is going to work right for you.  How full is your pool?

you could also consider not keeping a copy of the pool remotely be rather
pullting a tar backup off the backuppc system on some schedule and sending
that to the remote machine for storage.

The problem with using NBD or anything like that and using 'dd' is that
there is no resume support and with 3TB you are likely to get errors every
now and then.  even with a full T1 you are stuck at at least 6hour with
theoritical numbers and are probably looking at %50 more than that.

as far as some other scheme to syncing up the pools, hardlinks will get
you.

You could use find to traverse the entire pool and take some info down on
each file such as name, size, type etc etc and then use some fancy perl to
sort this out into managable groups and then use rsync on individual files.


On Mon, Dec 8, 2008 at 7:37 AM, Jeffrey J. Kosowsky
<[EMAIL PROTECTED]>wrote:

> Stuart Luscombe wrote at about 10:02:04 +0000 on Monday, December 8, 2008:
>  > Hi there,
>  >
>  >
>  >
>  > I've been struggling with this for a little while now so I thought it
> about
>  > time I got some help!
>  >
>  >
>  >
>  > We currently have a server running BackupPC v3.1.0 which has a pool of
>  > around 3TB and we've got to a stage where a tape backup of of the pool
> is
>  > taking 1-2 weeks, which isn't effective at all.  The decision was made
> to
>  > buy a server that is an exact duplicate of our current one and have it
>  > hosted in another building, as a 2 week old backup isn't ideal in the
> event
>  > of a disaster.
>  >
>  >
>  >
>  > I've got the OS (CentOS) installed on the new server and have installed
>  > BackupPC v3.1.0, but I'm having problems working out how to sync the
> pool
>  > with the main backup server. I managed to rsync the cpool folder without
> any
>  > real bother, but the pool folder is the problem, if I try an rsync it
>  > eventually dies with an 'out of memory' error (the server has 8GB), and
> a cp
>  > -a didn't seem to work either, as the server filled up, assumedly as
> it's
>  > not copying the hard links correctly?
>  >
>  >
>  >
>  > So my query here really is am I going the right way about this? If not,
>  > what's the best method to take so that say once a day the duplicate
> server
>  > gets updated.
>  >
>  >
>  >
>  > Many Thanks
>
> It just hit me that given the known architecture of the pool and cpool
> directories shouldn't it be possible to come up with a scheme that
> works better than either rsync (which can choke on too many hard
> links) and 'dd' (which has no notion of incremental and requires you
> to resize the filesystem etc.).
>
> My thought is as follows:
> 1. First, recurse through the pc directory to create a list of
>   files/paths and the corresponding pool links.
>   Note that finding the pool links can be done in one of several
>   ways:
>   - Method 1: Create a sorted list of pool files (which should be
>     significantly shorter than the list of all files due to the
>         nature of pooling and therefore require less memory than rsyn)
>         and then look up the links.
>   - Method 2: Calculate the md5sum file path of the file to determine
>         out where it is in the pool. Where necessary, determine among
>         chain duplicates
>   - Method 3: Not possible yet but would be possible if the md5sum
>     file paths were appended to compressed backups. This would add very
>         little to the storage but it would allow you to very easily
>         determine the right link. If so then you could just read the link
>         path from the file.
>
>  Files with only 1 link (i.e. no hard links) would be tagged for
>  straight copying.
>
> 2. Then rsync *just* the pool -- this should be no problem since by
>   definition there are no hard links within the pool itself
>
> 3. Finally, run through the list generated in #1 to create the new pc
>   directory by creating the necessary links (and for files with no
>   hard links, just copy/rsync them)
>
> The above could also be easily adapted to allow for "incremental" syncing.
> Specifically, in #1, you would use rsync to just generate a list of
> *changed* files in the pc directory. In #2, you would continue to use
> rsync to just sync *changed* pool entries. In #3 you would only act on
> the shortened incremental sync list generated in #1.
>
> The more I think about it, the more I LIKE the idea of appending the
> md5sums file paths to compressed pool files (Method #3) since this
> would make the above very fast. (Note if I were implementing this, I
> would also include the chain number in cases where there are multiple
> files with the same md5sum path and of course then BackupPC_nightly
> would have to adjust this any time it changed around the chain
> numbering).
>
> Even without the above, Method #1 would still be much less memory
> intensive than rsync and Method #2 while potentially a little slow
> would require very little memory and wouldn't be nearly that bad if
> you are doing incremental backups.
>
> ------------------------------------------------------------------
> Just as any FYI, if anyone wants to implement method #2, here is the
> routine I use to generate the md5sum file path from a (compressed)
> file (note that it is based on the analogous uncompressed version in
> Lib.pm).
>
> use BackupPC::Lib;
> use BackupPC::Attrib;
> use BackupPC::FileZIO;
>
> use constant _128KB               => 131072;
> use constant _1MB                 => 1048576;
>
> # Compute the MD5 digest of a compressed file. This is the compressed
> # file version of the Lib.pm function File2MD5.
> # For efficiency we don't use the whole file for big files
> #   - for files <= 256K we use the file size and the whole file.
> #   - for files <= 1M we use the file size, the first 128K and
> #     the last 128K.
> #   - for files > 1M, we use the file size, the first 128K and
> #     the 8th 128K (ie: the 128K up to 1MB).
> # See the documentation for a discussion of the tradeoffs in
> # how much data we use and how many collisions we get.
> #
> # Returns the MD5 digest (a hex string).
> #
> # If $filesize < 0 then always recalculate size of file by fully
> decompressing
> # If $filesize = 0 then first try to read corresponding attrib file
> #    (if it exists), if doesn't work then recalculate
> # IF $filesize >0 then use that as the size of the file
>
> sub zFile2MD5
> {
>    my($bpc, $md5, $name, $filesize, $compresslvl) = @_;
>
>        my $fh;
>        my $rsize;
>        my $totsize;
>
>        $compresslvl = $Conf{CompressLevel} unless defined $compresslvl;
>        unless (defined ($fh = BackupPC::FileZIO->open($name, 0,
> $compresslvl))) {
>                printerr "Can't open $name\n";
>                return -1;
>        }
>
>        my $datafirst = my $datalast = '';
>        my @data = ('','');
>        #First try to read up to the first 128K (131072 bytes)
>        if ( ($totsize = $fh->read(\$datafirst, _128KB)) < 0 ) {
>                printerr "Can't read & decompress $name\n";
>                return -1;
>        }
>        elsif ($totsize == _128KB) { # Read up to 1st MB
>                my $i=0;
>                #Read in up to 1MB (_1MB), 128K at a time and alternate
> between 2 data buffers
>                while ( ($rsize = $fh->read(\$data[(++$i)%2], _128KB) ==
> _128KB)
>                        &&  ($totsize += $rsize) < _1MB) {}
>                $totsize +=$rsize if $rsize < _128KB; # Add back in partial
> read
>            $datalast = substr($data[($i-1)%2], $rsize, _128KB-$rsize)
>                        . substr($data[$i%2], 0 ,$rsize);
>    }
>    $filesize = $totsize if $totsize < _1MB; #Already know the size because
> read it all
>    if ($filesize == 0) { # Try to find size from attrib file
>                $filesize = get_attrib_value($bpc, $name, "size");
>                warn "Can't read size of $name from attrib file so
> calculating manually\n" unless defined $filesize;
>        }
>    unless ($filesize > 0) { #continue reading to calculate size
>                while (($rsize = $fh->read(\($data[0]), _128KB)) > 0) {
>                    $totsize +=$rsize
>        }
>        $filesize = $totsize;
>   }
>   $fh->close();
>
>        $md5->reset();
>    $md5->add($filesize);
>    $md5->add($datafirst);
>    ($datalast eq '') || $md5->add($datalast);
>    return $md5->hexdigest;
> }
>
> # Returns value of attrib $key for $fullfilename (full path)
> # If attrib file not present or there is not an entry for
> # the specificed key for the given file, then return 'undef'
> sub get_attrib_value
> {
>        my ($fullfilename, $key) = @_;
>        $fullfilename =~ m{(.+)/f(.+)};  #1=dir; $2=file
>
>        return undef if read_attrib(my $attr, $1) < 0;
>        return $attr->{files}{$2}{$key}; #Note this returns undefined if key
> not present
> }
>
> #Reads in the attrib file for directory $_[1] and (optional alternative
> attrib file name $_[2]) and
> #stores it in the hashref $_[0] passed to the function
> #Returns -1 and a blank $_{0] hash ref if attrib file doesn't exist already
> (not necessarily an error)
> #Dies if attrib file exists but can't be read in.
> sub read_attrib
> { #Note: $_[0] = hash reference to attrib object
>        $_[0] = BackupPC::Attrib->new({ compress => $Conf{CompressLevel} });
>        return -1 unless -f attrib($_[1], $_[2]);  #This is not necessarily
> an error because dir may be empty
>        die "Error: Cannot read attrib file: " . attrib($_[1],$_[2]) . "\n"
> unless $_[0]->read($_[1],$_[2]);
>        return 1;
> }
>
>
> ------------------------------------------------------------------------------
> SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
> The future of the web can't happen without you.  Join us at MIX09 to help
> pave the way to the Next Web now. Learn more and register at
>
> http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
> _______________________________________________
> BackupPC-users mailing list
> [email protected]
> List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
> Wiki:    http://backuppc.wiki.sourceforge.net
> Project: http://backuppc.sourceforge.net/
>

------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/

_______________________________________________
BackupPC-users mailing list
[email protected]
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Re: [BackupPC-users] Advice on creating duplicate backup server

Reply via email to