Re: [BackupPC-users] Scaling BackupPC

dan Wed, 16 Jan 2008 20:50:10 -0800

this reply is specific to:

> I think solaris or *bsd (maybe OS X soon) with zfs sounds promising for
> this with its incremental send/receive facility but I haven't tried it
> yet to see if millions of hardlinks are an issue.  I'm using a hot-swap
> sata cage to raid-sync a 750 gig drive to rotate offsite periodically.
> It's not very elegant but it works.

ZFS has no issues with file count during incremental transfers.  The
transfers happen at the filesystem level not at the file level.  I have
*tested* transfers of filesets with millions of files without issue.  i
believe this is because unlike rsync which creates a list of the work to be
done that is stored in ram, ZFS transfers in a stream much more akin to
using the dd command accross nfs or something, block level or rather the
equivilent in ZFS.

This does work though it requires a lot of shell scripting to push that data
and verify the data as there is no "resume" support for inturupted
transfers.

A more radical change would likely be more effective but im not sure how
likely it is to get implemented, and that is to store the files for the
backup in an SQL database which could easily be replicated in an efficient
manner to a remote machine and connection disruptions would not be an issue
as the sync could be repeated til complete much like rsync can be re-run and
the previous transfers are essentially complete and dont need redone. with a
database, the records in a specific table can be linked from another table
much like hardlinks.  a file hash can be stored in the record so it would
only need computed once as well as any other relevant data.  each backed up
PC would get its own table and the pool would been in a dedicated table.
this would also allow multiple frontends to access the database
simultaniously.  some SQL databases allow for table compression(such as
oracle 9i, mysql, and postgresql i think) though you should not use on disk
compression with an SQL database because performance will take a nosedive.

about performance, the SQL would likely outperform direct file access on
most filesystems and would definitely save on CPU time during backups as SQL
queries for file hashes would be much faster than linux console applications
working on the file level.

On Jan 16, 2008 4:23 PM, Les Mikesell <[EMAIL PROTECTED]> wrote:

> Timothy J. Massey wrote:
>
> >  > > I am not sure a SAN/NAS with a clustered file system buys you much
> >  > > because each instance of BackupPC needs its own space (e.g. you
> cannot
> >  > > have multiple instances sharing the same pool space.
> >  >
> >  > Interesting concept... I bet you could if you could coordinate the
> >  > nightly cleanup run so nothing would be making links at the same time
> >  > that pool files were being deleted.
> >
> > I would love this feature:  multiple BackupPC front-ends running against
> > the same pool.  It would be perfect for an active-active cluster.  I've
> > though about using two separate BackupPC servers with DRBD between them
> > in an active/standby, but this would allow a load-balanced set of
> > BackupPC servers with a common pool.
>
> > Even better, I would pay a modest bounty for such a feature:  multiple
> > BackupPC machines working against the same pool, with a single
> > configuration, backup queue, etc. between them.
>
> If the storage is on an NFS server with reasonable semantics I don't see
> why it would be any different in terms of file activity/contention than
> multiple backup processes running on the same machine.  Collisions in
> creating a new file with the same name should result in a detectable
> error.  The only place you can go wrong is when you are removing pooled
> files because the test of the link count and the removal aren't atomic.
> The only new coding it might take would be to make the backuppc machines
> see the same cpool directory but only their own other directories all on
> the same mounted filesystem, and a way to interlock the backuppc_nightly
> runs.
>
> > Another way of doing it would be to have a way to replicate a backup
> > from one server to another, where backup data could be pushed or pulled
> > by the BackupPC processes on two different boxes without actually doing
> > a normal backup from two different servers.  Then I can have two online,
> > active BackupPC servers with the same data.
> >
> > I'd be willing to pay a bounty for this one, too.  An online way to
> > replicate backups between multiple BackupPC servers.
>
> I think solaris or *bsd (maybe OS X soon) with zfs sounds promising for
> this with its incremental send/receive facility but I haven't tried it
> yet to see if millions of hardlinks are an issue.  I'm using a hot-swap
> sata cage to raid-sync a 750 gig drive to rotate offsite periodically.
> It's not very elegant but it works.
>
> --
>   Les Mikesell
>    [EMAIL PROTECTED]
>
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2008.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> _______________________________________________
> BackupPC-users mailing list
> [email protected]
> List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
> Wiki:    http://backuppc.wiki.sourceforge.net
> Project: http://backuppc.sourceforge.net/
>

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

_______________________________________________
BackupPC-users mailing list
[email protected]
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Re: [BackupPC-users] Scaling BackupPC

Reply via email to