Re: [BackupPC-users] Backuppc in large environments

2020-12-02 Thread Richard Shaw
On Wed, Dec 2, 2020 at 7:30 AM Daniel Berteaud 
wrote:

> All this workflow can be seen in my virt-backup script [1], which is a
> helper for BackupPC to backup libvirt managed VM.
>

Daniel,

I'd like to talk to you about formally packaging your script for Fedora /
EPEL. I just packaged chunkfs (but still need to submit a Review Request
for formal inclusion). Unfortunately the Makefile is shall we say rustic :)
I ended up just writing my own CMake file to build it.

Next I can package your script. I took a quick look in the provided spec
file and it seems to have a lot of "antique" stuff in it that's no longer
needed with modern Fedora/EL. I'm not sure if you have people using the
script that are still on EL 5 or not, but I can help you update your spec
file or just update a copy of it for Fedora EPEL.

Thanks,
Richard
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:https://github.com/backuppc/backuppc/wiki
Project: https://backuppc.github.io/backuppc/


Re: [BackupPC-users] Backuppc in large environments

2020-12-02 Thread Daniel Berteaud
- Le 2 Déc 20, à 12:53, Dave Sherohman  a écrit : 

> - I'm definitely backing up the VMs as individual hosts, not as disk image
> files. Aside from minimizing atomicity concerns, it also makes single-file
> restores easier and, in the backuppc context, I doubt that deduplication would
> work well (if at all) with disk images.
It's possible to have dedup for huge files changing randomly, but it's a bit 
tricky ;-) 
I use this for some VM images backup : 

* Suspend the VM 
* Take an LVM snapshot (if available) 
* Resume the VM if a snapshot was taken (almost no downtime) 
* Mount the snapshot with chunkfs [0], which will make the big file appears 
as a lot of small chunks 
* Use BackupPC to backup the chunks 
* Resume the VM if no snapshot was taken (in which case there was downtime) 

With this you have dedup, and you can choose the granularity (with BackupPC v4, 
I use 2MB chunks). It requires a few more steps to restore though : 

* Mount the backup tree with fuse-backuppcfs 
* From this mount point, re-assemble the chunks as one, virtual huge file 
(still with chunkfs, which does the reverse operation) 
* You can now copy the image file where you want, and unmount the two 
stacked fuse mount points when done 

All this workflow can be seen in my virt-backup script [1], which is a helper 
for BackupPC to backup libvirt managed VM. 

The same can be done with some scripting for any large binary file, or block 
device. 

Cheers, 
Daniel 

[0] [ http://chunkfs.florz.de/ | http://chunkfs.florz.de/ ] 
[1] https://git.fws.fr/fws/virt-backup 

-- 

[ https://www.firewall-services.com/ ]  
Daniel Berteaud 
FIREWALL-SERVICES SAS, La sécurité des réseaux 
Société de Services en Logiciels Libres 
Tél : +33.5 56 64 15 32 
Matrix: @dani:fws.fr 
[ https://www.firewall-services.com/ | https://www.firewall-services.com ] 
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:https://github.com/backuppc/backuppc/wiki
Project: https://backuppc.github.io/backuppc/


Re: [BackupPC-users] Backuppc in large environments

2020-12-02 Thread Dave Sherohman
Thanks, everyone!  Looks like backuppc should be able to handle my 
network, no problem.  To hit on specific points, in threaded order:


- I'll be sure to get plenty of RAM.  We're going to be buying a new, 
probably Dell, rackmount system for this and I wouldn't have been 
getting any less than 64G RAM anyhow, but bumping it up to 256 should be 
no problem.


- I haven't looked at the Debian docs for backuppc yet, but it is 
packaged in the main Debian stable repo and there should be 
Debian-specific install instructions in the package.  They're usually 
pretty good, so I don't anticipate any major setup hassles.


- Budget is finite, but this is to replace an existing Tivoli backup 
solution, so organizational accounting rules say I can probably get 5 
years of TSM license fees with few or no questions asked.  And IBM's 
licensing fees ain't cheap.


- I'm definitely backing up the VMs as individual hosts, not as disk 
image files.  Aside from minimizing atomicity concerns, it also makes 
single-file restores easier and, in the backuppc context, I doubt that 
deduplication would work well (if at all) with disk images.


- For the database servers, I was already considering a cron job to do 
SQL dumps of everything and backing that up instead of the raw database 
files.  But there's something fishy with the server that's sending 
400G/day anyhow...  It only has about 650G used on it and /var/lib/mysql 
is under 100G, so there's no reason it should have 400G of changes 
daily.  I'm in the process of looking into that.


- Thanks for the tips on zfs settings.  I tend to use ext4 by default 
and planned to look at btrfs as an alternative, but I'll check zfs out, too.


- I'm already running icinga, so monitoring is handled.  (Or will be, 
once the backup server is installed.)


- I hadn't considered the possibility of horizontal scaling. Thanks for 
bringing that up.  I'll have a chat with the other admins tomorrow and 
see what they think about that, although I think I personally prefer 
vertical scaling just for the simplicity of single-point administration.


And another question which came to mind from the zfs point:  Is anyone 
familiar with VDO (Virtual Data Optimizer)?  It's an abstraction layer 
which sits between the kernel and the filesystem and does on-the-fly 
data compression and disk-block-level deduplication.  A friend uses a 
homegrown rsync-based backup system and says it cuts his disk usage 
significantly, but I'm wondering whether it would help much in a 
backuppc setting, since bpc already does its own file-level deduplication.


On 12/1/20 5:37 PM, Richard Shaw wrote:
So long story short, a lot of it will depend on how fast your data 
changes/grows, but it doesn't necessarily require a high end computer. 
You really just need something beefy enough as to not be the 
bottleneck. If you can make the client I/O the bottleneck, then you're 
good. Depending on your budget (or what you have lying around) a 
decent AMD budget Ryzen system would work quite nicely.


If you're familiar with Debian then I'm sure it's well documented how 
to install and setup. I maintain the Fedora EPEL version and run it on 
CentOS 8 quite nicely.


Thanks,
Richard


___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:https://github.com/backuppc/backuppc/wiki
Project: https://backuppc.github.io/backuppc/
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:https://github.com/backuppc/backuppc/wiki
Project: https://backuppc.github.io/backuppc/


Re: [BackupPC-users] Backuppc in large environments

2020-12-01 Thread Adam Goryachev via BackupPC-users


On 2/12/20 10:35, G.W. Haywood via BackupPC-users wrote:

Hi there,

On Tue, 1 Dec 2020, backuppc-users-requ...@lists.sourceforge.net wrote:


How big can backuppc reasonably scale?


Remember you can scale vertically or horizontally. Either get a bigger 
machine for your backups, or get more small machines. If you had 3 (or 
more) small machines, you can set 2 to backup each target, this gives 
you some additional redundancy of your backups infrastructure, as long 
as your backup windows can support this, or backups don't add enough 
load to interfere with your daily operations.


I guess at some point using too small machines would be more painful to 
manage, but there are a lot of options for scaling. Most people (vague 
observations) I think just scale vertically and add enough RAM or IO 
performance to handle the load.




... daily backup volume is running around 750 GB per day, with two
database servers providing the majority of that volume (400 GB/day
from one and 150 GB/day from the other).


That's the part which bothers me.  I'm not sure that BackupPC's ways
of checking for changed files marry well with database files.  In a
typical relational database server you'll have some *big* files which
are modified by more or less random accesses.  They will *always* be
changed from the last backup.  The backup of virtual machines is not
dissimilar at the level of the partition image.  You need to stop the
machine to get a consistent backup, or use something like a snapshot.

I just want to second this, my preference is to snapshot the VM (a pre 
backup script from backuppc) and then backup the content of the VM (the 
actual target I use is the SAN server rather than the VM itself). For 
the DB, you should exclude the actual DB files, and have a script 
(either called separately or from BPC pre backup) which will export/dump 
the DB to another consistent file. If possible, this file should be 
uncompressed (allows rsync to better see the unchanged data), and with 
the same filename/path each day (again so rsync/BPC will see this as a 
file with some small amount of changes instead of a massive new file).


If you do that, you might see your daily "changes" reduce compared to 
before.



... I have no idea what to expect the backup server to need in the
way of processing power.


Modest.  I've backed up dozens of Windows workstations and five or six
servers with just a 1.4GHz Celeron which was kicking around after it
was retired from the sales office.  The biggest CPU hog is likely to
be data compression, which you can tune.  Walking directory trees can
cause rsync to use quite a lot of memory.  You might want to look at
something like Icinga/Nagios to keep an eye on things.

FYI, I backup 57 hosts, my current BPC pool size if 7TB, 23M files. Some 
of my backup clients are external on the Internet, some are windows, 
most are linux.


My BPC server has 8G RAM and a quad core CPU:
Intel(R) Core(TM) i3-4150 CPU @ 3.50GHz

As others have said, you are most likely to be IO bound after the first 
couple of backups. You are probably advised to grab a spare machine, 
setup BPC, run a couple of backups against a couple of smaller targets, 
once you have it working (if all goes smoothly, under 2 hours), target a 
larger server, you will soon start to see how it performs in your 
environment, and where the relevant bottlenecks are.


PS, All you need to think about is the CPU requirement to compress 750GB 
per backup cycle (you only need to compress the changed files), and the 
disk IO to write the 750GB (plus a lot of disk IO to do all the 
comparisons, which is probably the main load, which is why you also want 
a lot of RAM to cache the directory trees).


Regards,
Adam



___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:https://github.com/backuppc/backuppc/wiki
Project: https://backuppc.github.io/backuppc/


Re: [BackupPC-users] Backuppc in large environments

2020-12-01 Thread G.W. Haywood via BackupPC-users

Hi there,

On Tue, 1 Dec 2020, backuppc-users-requ...@lists.sourceforge.net wrote:


How big can backuppc reasonably scale?


You can scale it yourself as has already been suggested, but I don't
think you'd have any problems with a single backup server and the data
volumes you've described if you were sensible about the configuration,
which is very flexible.  However...


... daily backup volume is running around 750 GB per day, with two
database servers providing the majority of that volume (400 GB/day
from one and 150 GB/day from the other).


That's the part which bothers me.  I'm not sure that BackupPC's ways
of checking for changed files marry well with database files.  In a
typical relational database server you'll have some *big* files which
are modified by more or less random accesses.  They will *always* be
changed from the last backup.  The backup of virtual machines is not
dissimilar at the level of the partition image.  You need to stop the
machine to get a consistent backup, or use something like a snapshot.

Normally I do some sort of separate database dump for database files,
and run that system separately from run-of-the-mill Linux/Windows box
server/workstation backups.  After all, I usually just want a single
good backup of any database.  Having several copies, aged at one day,
one week, two weeks, a month etc. would usually be of no use to me.


... I have no idea what to expect the backup server to need in the
way of processing power.


Modest.  I've backed up dozens of Windows workstations and five or six
servers with just a 1.4GHz Celeron which was kicking around after it
was retired from the sales office.  The biggest CPU hog is likely to
be data compression, which you can tune.  Walking directory trees can
cause rsync to use quite a lot of memory.  You might want to look at
something like Icinga/Nagios to keep an eye on things.

--

73,
Ged.


___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:https://github.com/backuppc/backuppc/wiki
Project: https://backuppc.github.io/backuppc/


Re: [BackupPC-users] Backuppc in large environments

2020-12-01 Thread Daniel Berteaud
- Le 1 Déc 20, à 16:33, Dave Sherohman dave.sheroh...@ub.lu.se a écrit :

> 
> Is this something that backuppc could reliably handle?
> 
> If so, what kind of CPU resources would it require?  I've already got a
> decent handle on the network requirements from observing the current TSM
> backups and can calculate likely disk storage needs, but I have no idea
> what to expect the backup server to need in the way of processing power.
> 

While not as big as you, I manage a reasonably big BackupPC server, on a single 
box. It's backing up 193 hosts in total, the pool is ~15TB, ~27 million files. 
The hosts are a mix of a lot of different stuff (mostly VM, but also a few 
appliances, and physical servers), with various backup frequency and history 
config. Most are backed up daily, but some are weekly. It usually represent 
between 200 and 600GB of new data per day.

I'm running this on a single box with those spec :
  * CPU Intel Xeon D-1541 @ 2.10GHz
  * 32GB of RAM
  * 2x120GB SSD for the OS (CentOS 7)
  * 4x12TB SATA in a ZFS pool (~ RAID10)

I'm using the lz4 compression provided by ZFS, so turned the BackupPC one off.

While I do see some slowliness from time to time, it's working well. Long story 
short: don't bother with CPU. Except for the very first backups where it can be 
a bottleneck, disk I/O is what will limit general speed. Spend more in fast 
disks or SSD. If using ZFS, NVMe as a slog can help (or as special metadata 
vdev, although I haven't tested it yet). And as much RAM as you can. with what 
you have left, choose a decent CPU, but don't spend too much on it.

Cheers,
Daniel

-- 
[ https://www.firewall-services.com/ ]  
Daniel Berteaud 
FIREWALL-SERVICES SAS, La sécurité des réseaux 
Société de Services en Logiciels Libres 
Tél : +33.5 56 64 15 32 
Matrix: @dani:fws.fr 
[ https://www.firewall-services.com/ | https://www.firewall-services.com ]



___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:https://github.com/backuppc/backuppc/wiki
Project: https://backuppc.github.io/backuppc/


Re: [BackupPC-users] Backuppc in large environments

2020-12-01 Thread Les Mikesell
On Tue, Dec 1, 2020 at 9:50 AM Dave Sherohman  wrote:
>
> Is this something that backuppc could reliably handle?
>
> If so, what kind of CPU resources would it require?  I've already got a
> decent handle on the network requirements from observing the current TSM
> backups and can calculate likely disk storage needs, but I have no idea
> what to expect the backup server to need in the way of processing power.
>

The price is right for the software and you'll use a lot less disk
space than other methods (with the big plus of instant access to
single files or directories), so consider that you could divide the
backups into two or more groups handled by different servers if you
run into trouble with just one.   It will help if the server has a lot
of ram, and if you back up the virtual machines like individual hosts
instead of their image files.  Likewise you'll probably need to
interact with the database servers with commands backuppc can send to
get dumps that won't change during the copy.

  -- Les Mikesell
lesmikes...@gmail.com


___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:https://github.com/backuppc/backuppc/wiki
Project: https://backuppc.github.io/backuppc/


Re: [BackupPC-users] Backuppc in large environments

2020-12-01 Thread Richard Shaw
So long story short, a lot of it will depend on how fast your data
changes/grows, but it doesn't necessarily require a high end computer. You
really just need something beefy enough as to not be the bottleneck. If you
can make the client I/O the bottleneck, then you're good. Depending on your
budget (or what you have lying around) a decent AMD budget Ryzen system
would work quite nicely.

If you're familiar with Debian then I'm sure it's well documented how to
install and setup. I maintain the Fedora EPEL version and run it on CentOS
8 quite nicely.

Thanks,
Richard
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:https://github.com/backuppc/backuppc/wiki
Project: https://backuppc.github.io/backuppc/


Re: [BackupPC-users] Backuppc in large environments

2020-12-01 Thread Paul Leyland
My network is rather smaller but still bigger than most home systems.
Please keep that in mind.

The backup server is a very elderly "Intel(R) Core(TM)2 CPU 
6600  @ 2.40GHz" with 8G RAM.  /var/lib/backuppc is a ZFS raidz array of
three 4TB disks, giving a useful space of 3.6T, of which 1.1T is now
used. The CGI interface reports:

There are 9 hosts that have been backed up, for a total of:

  * 109 full backups of total size 15511.81GB (prior to pooling and
compression),
  * 65 incr backups of total size 235.11GB (prior to pooling and
compression).

but I like to keep an archive as well as a backup so storing 15.5TB of
files in 1.1TB of space may be misleading because there are so many
files de-duplicated.

The server is on a single 1 gigabit NIC. It runs up to four backups
simultaneously and a full backup of a 0.4TB machine takes araound 12
hours; this appears to be disk IO bound at each end as incremental
backups of other machines proceed at a decent rate.

TL:DR: A 10 year old box very easily copes with my load.  YMMV.  In
particular, you may wish to have more than one ethernet NIC and perhaps
more RAM.

Paul

On 01/12/2020 15:33, Dave Sherohman wrote:
> Hey, all!
>
> I've been looking at setting up amanda as a backup solution for a
> fairly large environment at work and have just stumbled across
> backuppc.  While I love the design and scheduling methods of amanda,
> I'm also a big fan of incremental-only reverse-delta backup methods
> such as that used by backuppc, so now I'm wondering...
>
> How big can backuppc reasonably scale?
>
> The environment I'm dealing with includes around 75 various servers
> (about 2/3 virtual, 1/3 physical), mostly running Debian, with a few
> machines running other linux distros and maybe a dozen Windows
> machines.  Total data size that we want to maintain backups for is
> around 70 TB.  Our current backup system is using Tivoli Storage
> Manager, a commercial product that uses an incremental-only strategy
> similar to backuppc's, and the daily backup volume is running around
> 750 GB per day, with two database servers providing the majority of
> that volume (400 GB/day from one and 150 GB/day from the other).
>
> Is this something that backuppc could reliably handle?
>
> If so, what kind of CPU resources would it require?  I've already got
> a decent handle on the network requirements from observing the current
> TSM backups and can calculate likely disk storage needs, but I have no
> idea what to expect the backup server to need in the way of processing
> power.
>
>
>
> ___
> BackupPC-users mailing list
> BackupPC-users@lists.sourceforge.net
> List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
> Wiki:    https://github.com/backuppc/backuppc/wiki
> Project: https://backuppc.github.io/backuppc/


OpenPGP_0xBA5077290CFFDDA6.asc
Description: application/pgp-keys


OpenPGP_signature
Description: OpenPGP digital signature
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:https://github.com/backuppc/backuppc/wiki
Project: https://backuppc.github.io/backuppc/


Re: [BackupPC-users] Backuppc in large environments

2020-12-01 Thread Richard Shaw
Not a direct response to your question but I run my to backup computers at
my home, so quite a bit smaller scale, however, the 4th gen i5 SFF PC I
bought off Ebay w/ 1TB hard drive dedicated to BackupPC and M.2 SSD for
CentOS 8 works quite well for me, so a REAL computer should do fine. I did
max out the memory with 8GB.

[image: image.png]

Thanks,
Richard
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:https://github.com/backuppc/backuppc/wiki
Project: https://backuppc.github.io/backuppc/


[BackupPC-users] Backuppc in large environments

2020-12-01 Thread Dave Sherohman

Hey, all!

I've been looking at setting up amanda as a backup solution for a fairly 
large environment at work and have just stumbled across backuppc.  While 
I love the design and scheduling methods of amanda, I'm also a big fan 
of incremental-only reverse-delta backup methods such as that used by 
backuppc, so now I'm wondering...


How big can backuppc reasonably scale?

The environment I'm dealing with includes around 75 various servers 
(about 2/3 virtual, 1/3 physical), mostly running Debian, with a few 
machines running other linux distros and maybe a dozen Windows 
machines.  Total data size that we want to maintain backups for is 
around 70 TB.  Our current backup system is using Tivoli Storage 
Manager, a commercial product that uses an incremental-only strategy 
similar to backuppc's, and the daily backup volume is running around 750 
GB per day, with two database servers providing the majority of that 
volume (400 GB/day from one and 150 GB/day from the other).


Is this something that backuppc could reliably handle?

If so, what kind of CPU resources would it require?  I've already got a 
decent handle on the network requirements from observing the current TSM 
backups and can calculate likely disk storage needs, but I have no idea 
what to expect the backup server to need in the way of processing power.




___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:https://github.com/backuppc/backuppc/wiki
Project: https://backuppc.github.io/backuppc/