On 21/11/2025 16:28, Rob Gerber wrote:
> Phil,
>
> I have a grab-bag of thoughts.
>
> In my mind, there are 3 basic possibilities:
> 1. Bacula is configured in a way that constrains performance.
(bandwidth limit imposed in bacula config, compression is chewing up all
the CPU cycles, multiple concurrent jobs are using all the available
resources, etc)
> 2. Something about the systems in question are constrained in
performance. (bad network link, failing hard drive, etc)
Network link is fine according to Iperf3:
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 115 MBytes 966 Mbits/sec 0 365 KBytes
[ 5] 1.00-2.00 sec 112 MBytes 940 Mbits/sec 0 383 KBytes
[ 5] 2.00-3.00 sec 112 MBytes 937 Mbits/sec 0 393 KBytes
[ 5] 3.00-4.00 sec 113 MBytes 949 Mbits/sec 0 440 KBytes
[ 5] 4.00-5.00 sec 111 MBytes 931 Mbits/sec 0 474 KBytes
Drives test fine on both ends (SMART and hdparm speed test).
Client disks --
Timing cached reads: 43360 MB in 1.99 seconds = 21781.17 MB/sec
Timing buffered disk reads: 590 MB in 3.00 seconds = 196.60 MB/sec
Server spool disk (Intel SAS SSD) --
Timing cached reads: 30266 MB in 1.98 seconds = 15263.37 MB/sec
Timing buffered disk reads: 1286 MB in 3.00 seconds = 428.45 MB/sec
> 3. Bacula is malfunctioning (no examples come to mind, but it isn't
impossible)
>
> Bacula configuration:
> From your bacula-dir.conf, please show us the relevant job, jobdef,
storage, pool, schedule, and fileset resources.
Job {
Name = "Backup-Workstation"
JobDefs = "DefaultJob"
Client = cheetah-fd
Pool = Tape
FileSet="Cheetah-Fileset"
}
JobDefs {
Name = "DefaultJob"
Type = Backup
Level = Incremental
Client = syrys-fd
Schedule = "WeeklyCycle"
Storage = Tape
Messages = Standard
Pool = Default
Priority = 9
Write Bootstrap = "/var/lib/bacula/%c.bsr"
# Cancellation options are documented in Fig 12.2 at
https://www.bacula.org/5.2.x-manuals/en/main/main/Configuring_Director.html
Allow Duplicate Jobs = no
Cancel Lower Level Duplicates = yes
Cancel Queued Duplicates = yes
Cancel Running Duplicates = no
# only allow this job to run once concurrently
#Maximum Concurrent Jobs = 0
Spool Data = yes
Spool Attributes = yes
}
# Definition of LTO-6 tape storage device
Storage {
Name = Tape
Address = REDACTED
SDPort = 9103
Password = REDACTED
Device = Superloader3 # must be same as Device
in Storage daemon
Media Type = LTO-6 # must be same as MediaType in
Storage daemon
Autochanger = yes # enable for autochanger device
AllowCompression = no # use the LTO drive's hardware compression
(PP 2018-10-27)
# note - this overrides software compression to
off.
# 'mt defcompression' sets compression on/off
Maximum Concurrent Jobs = 4 # added 2020-01-19
}
# LTO tape pool definition
Pool {
Name = Tape
Pool Type = Backup
# If a label format is not specified, Bacula will not attempt to
automatically label tapes
# Label Format = BAK
Recycle = yes # Bacula can automatically
recycle Volumes
AutoPrune = yes # Prune expired volumes
# Volume Retention = 13 days # Keep at least a fortnight
worth of old data
# Volume Retention = 0 day # PP 2018-10-27 allow Bacula to recycle
essentially any volume
Volume Retention = 12 hour # PP 2025-11-06 because this wasn't working
Recycle Oldest Volume = yes # If we absolutely can't find a tape,
# recycle the oldest one
}
# When to do the backups, full backup on first sunday of the month,
# differential (i.e. incremental since full) every other sunday,
# and incremental backups other days
Schedule {
Name = "WeeklyCycle"
Run = Full 1st sun at 01:05
Run = Differential 2nd-5th sun at 01:05
Run = Incremental mon-sat at 01:05
}
FileSet {
Name = "Cheetah-Fileset"
Include {
Options {
signature = MD5
compression = LZO
# Allow descending into different FSes (needed
for ZFS)
onefs = no
}
File = /mnt/zfs
File = /home
}
Exclude {
File = /mnt/zfs/video-temp
File = /mnt/zfs/nextcloud
File = /mnt/zfs/steam
}
}
> From bacula-sd.conf,
> please show us the device and autochanger resources that the impacted
> jobs output to.
# Superloader3 configuration based on
http://www.backupcentral.com/phpBB2/two-way-mirrors-of-external-mailing-lists-3/bacula-25/dell-pv-124t-autochanger-configuration-98158/
Autochanger {
Name = Superloader3
Device = Superloader3-1
#Changer Command = "/etc/bacula/scripts/mtx-changer %c %o %S %a %d"
Changer Command = "/etc/bacula/scripts/local-tapechanger %c %o %S %a %d"
Changer Device = /dev/changer
}
Device {
Name = Superloader3-1
Drive Index = 0
Media Type = LTO-6
Archive Device = /dev/tape_nst
Control Device = /dev/tape_sg # added 2020-01-10
AutomaticMount = yes; # when device opened, read it
AlwaysOpen = yes;
RemovableMedia = yes;
RandomAccess = no;
Maximum File Size = 12GB # increased from 5 to 12GB
2020-01-19
# buffer sizes from
http://www.backupcentral.com/forum/19/221466/tuning_lto-4 -- 2020-01-19
Maximum Network Buffer Size = 65536
# may reduce shoeshine, see
https://www.bacula.org/9.4.x-manuals/en/main/Storage_Daemon_Configuratio.html
Maximum block size = 2M
# end 2020-01-19
# Don't let Bacula autolabel tapes
LabelMedia = no;
# Hooray for autochangers!
# Changer Command = "/etc/bacula/scripts/mtx-changer %c %o %S %a %d"
# Changer Device = /dev/changer
AutoChanger = yes
# Enable the Alert command only if you have the mtx package loaded
# Alert Command = "sh -c 'tapeinfo -f %c |grep TapeAlert|cat'"
# If you have smartctl, enable this, it has more info than tapeinfo
# Alert Command = "sh -c 'smartctl -H -l error %c'"
# This does a few things:
# 1) Resolve the tape_sg symlink our Udev rules set up (in case the sg
device name changes) -- smartctl can't handle symlinks
# 2) Poke smartctl to get the TapeAlert and read/write error data
# This is a little weird because %c gives the tape robot's device node,
and that only supports the tape movement commands.
# Alert Command = "sh -c 'smartctl -H -l error `readlink -f
/dev/tape_sg`'"
# added 2020-01-19 nd disabled the above smartctl command
Alert Command = "/etc/bacula/scripts/tapealert %l"
# Make disk spooling a bit more polite
# We use the SSD because it's much faster than the ZFS arrays
# 2021-07-04 use the spool SSD to avoid burning write cycles on the
root SSD and lagging the machine out. set 100GB job spool size.
# 2025-10-08 increase max spool size to 400GB
Maximum Spool Size = 400G
Maximum Job Spool Size = 100G
Spool Directory = "/mnt/scratch/bacula/lto"
}
> Also, please let us know if your jobs are running concurrently. Are
all your jobs outputting to disk volumes, or directly to tape? (I know
you mentioned tape performance, but I guess it's possible some jobs are
writing to disk volumes).
I'm running a maximum of two jobs concurrently.
They output to a SAS SSD as spool (attributes and data) then the SSD is
spooled to the tape. This is about the only way I've found to stop the
LTO from starving; the shoeshining (stopping, seeking back, starting
again) from spooling direct makes the backup take even longer.
> I suppose you could create a test job and a more limited test fileset
for your desktop.
It doesn't seem to matter how much I back up -- the tape unspooling is
as fast as it ever was (maxes out the drive at ~100MB/s) but the
transfer from the FD to the Director/SD seems to be the slow part.
> If writing to tape, you might want to make a pool for > this and
dedicate a tape to these tests, so this data isn't mixed in
> with your current data. Generate a bunch of large, incompressible
files (1GB, maybe 20 files?). Run a full backup. What backup speeds do
you see?
I'll look into that, but I don't think I have a spare LTO6 tape - I'll
need to get some more.
> Here are the immediate possible bottleneck sources I can think of:
Desktop CPU, Desktop storage, Desktop <-> NAS network link, NAS storage,
and/or bacula database.
I doubt there are any major issues with the CPUs, the desktop is running
a Ryzen 5 5600X, the backup server is an i5-9400.
The database is PostgreSQL - it used to be MySQL but I migrated a few
months ago.
> Desktop CPU: check top while a backup is running from the desktop to
the NAS. Do you see a single 'bacula-fd' process maxing out a CPU core?
I think I checked that and I don't recall it doing that.
> Desktop storage: Try making a tar of some of your files, dumping the
output to /dev/null. For bonus points, try dumping the tar to the NAS,
but I'd still want to see the performance numbers when the output is /
dev/null. Prefer large, non-compressible files. Can use hyperfine to
time / benchmark this.
philpem@cheetah:/mnt/zfs$ tar c archive-disks | pv > /dev/null
3.07GiB 0:00:19 [73.4MiB/s]
It hovers around 70MB/s, peaks at 200, troughs at 30. More or less the
same performance numbers as the NAS.
> Desktop <-> NAS network link: As a basic first step, try iperf3 tests
in both directions between the desktop and the nas. What numbers are we
seeing?
Nearly a gigabit per second.
> NAS storage: We're already seeing 100+MB/s unspool rates from SSD to
tape, but what write speeds do we see to the NAS (spinning?) disks? I
would want to test both internally to the NAS (dd from /dev/urandom to a
file, compare with dd from /dev/zero to a file), and from the disk on
the NAS to a fast destination, probably dd or tar existing files,
dumping output to /dev/null. Bonus points if the test files in the disk
>> fast storage test are large (1GB), and contain random incompressible
data.
$ sudo dd if=/dev/urandom of=/mnt/scratch/tmp bs=1G count=1 status=progress
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 5 s, 221 MB/s
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 4.86835 s, 221 MB/s
"/mnt/scratch" is the SSD.
It's a bit faster writing nulls:
philpem@syrys:/mnt/scratch$ sudo dd if=/dev/zero of=/mnt/scratch/tmp
bs=1G count=1 status=progress
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 2 s, 449 MB/s
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 2.39124 s, 449 MB/s
Real numbers are probably somewhere in the middle.
Thanks.
--
Phil.
[email protected]
https://www.philpem.me.uk/
_______________________________________________
Bacula-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-users