I somehow missed the original email.
I suspect that with many many small files you're mostly limited by the
source filesystem (and whole system) performance more than sheer backup.
Regardless of what the method of deciding whether the file needs backing
up, its metadata still have to be read from the filesystem. Probably
tuning the source system (giving more memory for metadata cache) could
help a little.
But I wouldn't expect big differences just by switching from rsync to
bareos.
Just my three cents.
On 16.04.2021 15:21, Brock Palen wrote:
I have not seen any replies to your question, I can’t speak to that volume of
data though I see no reason why it cannot. Here are my thoughts below how I
would approach it along with some of your other questions.
* The number of files will impact things more than total data size. It will
increase database size, scan time etc.
* I have easily seen Bareos saturate well above 100Mbit networking. Though
100Mbit is very slow to do the initial full backup of 200T You are looking at
a minimum of 6 months assuming data does not compress. For initial backup you
might want to do sneaker net with a raspberry pi and a drobo. This is what I
do, full backup is done on site @ gig speeds then cary the entire setup to the
other site and do a volume migration to the real server.
https://fasterdata.es.net/home/requirements-and-expectations/
* Look at the Bareos client side compression options, on bandwidth constrained
hosts (this includes cloud because of cost) I use gzip turned all teh way up.
This will peg one CPU core but for text data reduces the volume of data over
wire drastically. Something like lz4 is a great low CPU impact but still get
70% of the compression of gzip. If you have the CPU core to burn and in a test
if it still saturates your 100Mbit maybe use it to get that backup time down.
If this is all video or already compressed images cram files it likely just
burns CPU for no impact. Baroes give you a report at the end of ajob of how
well it compressed.
* how baroes checks for files, using the accurate settings (recommended) the
server will upload a list of files it knows about to the client and it compares
them. This process is very fast, by default Bareos won’t use checksums to
compare, but only 1. does the file exist, 2. is the filesystem metadata
newer then in the database/catalog (file has changed). Incrementals with
Baroes are much faster than rsync. (I have moved PB of data with Rsync)
With 200TB of data you will want a lot of tape, otherwise you're looking at
400TB+ of disk. If your new to backup you have to build a new “full” every
so often. Given your network is 100Mbit I would look at the Always Incremental
features of Baroes. This will let you avoid the 180 days of a new full
backup. But you still have to write 200TB every so often but it can all be
done baroes server side. I recommend tape just for cost, as you need 66 LTO 7
tapes or 33 LTO8 tapes. LTO7 i still the best value but LTO8 has come down in
cost a lot and LTO9 is scheduled for GA this year. You will also want a few
tape drives and a fast spool pool of disks to do this right. This 2x minimum
size is one downside backup systems have to rsync.
An all disk solution will be faster because a big raid z2 will have greater
bandwidth or the VirtualFull, but it will be expensive. You could look at
something like 45 Drives to turn into your SD. I do a mix (again fraction the
size you are with Baroes)
I would personally split this into several jobs using wildcards in filesets,
and not have 1 200TB job, but several few TByte jobs. This will also let you
run jobs in parlalel recover better from a full backup failure, not have to
copy 200TB when you do a full etc.
Brock Palen
[email protected]
www.mlds-networks.com
Websites, Linux, Hosting, Joomla, Consulting
On Apr 15, 2021, at 10:49 AM, Steve Eppert <[email protected]> wrote:
Hi.
I need to backup around 200 TB of data (with many small files) with around 1 TB
per week new/changed data. Currently I simply rsync the data to an offsite
location using a 100 MBit/s connection.
While searching for solutions for making the rsync faster (because of the many
small files an rsync almost never uses the full 100 MBit/s) I stumbled across
Bareos.
A question I could not find an answer to in the docs is: how does the
bareos-fileseamon check for changed data when doing an incremental backup? Does
the daemon hold some kind of database or does it check each file against the
Bareos server? I'm wondering if a Bareos incremental backup job might be faster
than the rsync.
Also after looking at the docs I'm considering purchasing a tape loader to
backup a specific subset of more valuable data to tape.
Is it possible to have incremental backups to disk and do a regular full backup
of only a subset of this data to tape?
It it possible to get filesystem access to the incremental backed up data on
disk or is the Bareos interface the only way to access this data?
Thanks!
Steve
--
You received this message because you are subscribed to the Google Groups
"bareos-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/bareos-users/ccddf8c0-4fcc-4230-994f-157b9a2d1b06n%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups
"bareos-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/bareos-users/665b1812-af18-3889-8f55-21d5edf519ac%40gmail.com.