On 21/11/2025 16:28, Rob Gerber wrote:
> Phil,
>
> I have a grab-bag of thoughts.
>
> In my mind, there are 3 basic possibilities:
> 1. Bacula is configured in a way that constrains performance. (bandwidth limit imposed in bacula config, compression is chewing up all the CPU cycles, multiple concurrent jobs are using all the available resources, etc) > 2. Something about the systems in question are constrained in performance. (bad network link, failing hard drive, etc)

Network link is fine according to Iperf3:

[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   115 MBytes   966 Mbits/sec    0    365 KBytes
[  5]   1.00-2.00   sec   112 MBytes   940 Mbits/sec    0    383 KBytes
[  5]   2.00-3.00   sec   112 MBytes   937 Mbits/sec    0    393 KBytes
[  5]   3.00-4.00   sec   113 MBytes   949 Mbits/sec    0    440 KBytes
[  5]   4.00-5.00   sec   111 MBytes   931 Mbits/sec    0    474 KBytes

Drives test fine on both ends (SMART and hdparm speed test).

Client disks --
 Timing cached reads:   43360 MB in  1.99 seconds = 21781.17 MB/sec
 Timing buffered disk reads: 590 MB in  3.00 seconds = 196.60 MB/sec

Server spool disk (Intel SAS SSD) --
 Timing cached reads:   30266 MB in  1.98 seconds = 15263.37 MB/sec
 Timing buffered disk reads: 1286 MB in  3.00 seconds = 428.45 MB/sec


> 3. Bacula is malfunctioning (no examples come to mind, but it isn't impossible)
>
> Bacula configuration:
> From your bacula-dir.conf, please show us the relevant job, jobdef, storage, pool, schedule, and fileset resources.

Job {
       Name = "Backup-Workstation"
       JobDefs = "DefaultJob"
       Client = cheetah-fd
       Pool = Tape
       FileSet="Cheetah-Fileset"
}

JobDefs {
  Name = "DefaultJob"
  Type = Backup
  Level = Incremental
  Client = syrys-fd
  Schedule = "WeeklyCycle"
  Storage = Tape
  Messages = Standard
  Pool = Default
  Priority = 9
  Write Bootstrap = "/var/lib/bacula/%c.bsr"
# Cancellation options are documented in Fig 12.2 at https://www.bacula.org/5.2.x-manuals/en/main/main/Configuring_Director.html
  Allow Duplicate Jobs = no
  Cancel Lower Level Duplicates = yes
  Cancel Queued Duplicates = yes
  Cancel Running Duplicates = no
  # only allow this job to run once concurrently
  #Maximum Concurrent Jobs = 0
  Spool Data = yes
  Spool Attributes = yes
}

# Definition of LTO-6 tape storage device
Storage {
  Name = Tape
  Address = REDACTED
  SDPort = 9103
  Password = REDACTED
Device = Superloader3 # must be same as Device in Storage daemon Media Type = LTO-6 # must be same as MediaType in Storage daemon
  Autochanger = yes                   # enable for autochanger device
AllowCompression = no # use the LTO drive's hardware compression (PP 2018-10-27) # note - this overrides software compression to off.
                        # 'mt defcompression' sets compression on/off
  Maximum Concurrent Jobs = 4   # added 2020-01-19
}

# LTO tape pool definition
Pool {
  Name = Tape
  Pool Type = Backup
# If a label format is not specified, Bacula will not attempt to automatically label tapes
#  Label Format = BAK
Recycle = yes # Bacula can automatically recycle Volumes
  AutoPrune = yes                     # Prune expired volumes
# Volume Retention = 13 days # Keep at least a fortnight worth of old data # Volume Retention = 0 day # PP 2018-10-27 allow Bacula to recycle essentially any volume
  Volume Retention = 12 hour   # PP 2025-11-06 because this wasn't working
  Recycle Oldest Volume = yes         # If we absolutely can't find a tape,
                                      # recycle the oldest one
}

# When to do the backups, full backup on first sunday of the month,
#  differential (i.e. incremental since full) every other sunday,
#  and incremental backups other days
Schedule {
  Name = "WeeklyCycle"
  Run = Full 1st sun at 01:05
  Run = Differential 2nd-5th sun at 01:05
  Run = Incremental mon-sat at 01:05
}

FileSet {
        Name = "Cheetah-Fileset"
        Include {
                Options {
                        signature = MD5
                        compression = LZO
# Allow descending into different FSes (needed for ZFS)
                        onefs = no
                }

                File = /mnt/zfs
                File = /home
        }

        Exclude {
                File = /mnt/zfs/video-temp
                File = /mnt/zfs/nextcloud
                File = /mnt/zfs/steam
        }
}


> From bacula-sd.conf,
> please show us the device and autochanger resources that the impacted
> jobs output to.

# Superloader3 configuration based on http://www.backupcentral.com/phpBB2/two-way-mirrors-of-external-mailing-lists-3/bacula-25/dell-pv-124t-autochanger-configuration-98158/
Autochanger {
  Name = Superloader3
  Device = Superloader3-1
  #Changer Command = "/etc/bacula/scripts/mtx-changer %c %o %S %a %d"
  Changer Command = "/etc/bacula/scripts/local-tapechanger %c %o %S %a %d"
  Changer Device = /dev/changer
}

Device {
  Name = Superloader3-1
  Drive Index = 0
  Media Type = LTO-6
  Archive Device = /dev/tape_nst
  Control Device = /dev/tape_sg         # added 2020-01-10
  AutomaticMount = yes;               # when device opened, read it
  AlwaysOpen = yes;
  RemovableMedia = yes;
  RandomAccess = no;
Maximum File Size = 12GB # increased from 5 to 12GB 2020-01-19

# buffer sizes from http://www.backupcentral.com/forum/19/221466/tuning_lto-4 -- 2020-01-19
  Maximum Network Buffer Size = 65536
# may reduce shoeshine, see https://www.bacula.org/9.4.x-manuals/en/main/Storage_Daemon_Configuratio.html
  Maximum block size = 2M
  # end 2020-01-19

  # Don't let Bacula autolabel tapes
  LabelMedia = no;

  # Hooray for autochangers!
#  Changer Command = "/etc/bacula/scripts/mtx-changer %c %o %S %a %d"
#  Changer Device = /dev/changer
  AutoChanger = yes

  # Enable the Alert command only if you have the mtx package loaded
  # Alert Command = "sh -c 'tapeinfo -f %c |grep TapeAlert|cat'"
  # If you have smartctl, enable this, it has more info than tapeinfo
  # Alert Command = "sh -c 'smartctl -H -l error %c'"

# This does a few things:
# 1) Resolve the tape_sg symlink our Udev rules set up (in case the sg device name changes) -- smartctl can't handle symlinks
#   2) Poke smartctl to get the TapeAlert and read/write error data
# This is a little weird because %c gives the tape robot's device node, and that only supports the tape movement commands. # Alert Command = "sh -c 'smartctl -H -l error `readlink -f /dev/tape_sg`'"


  # added 2020-01-19 nd disabled the above smartctl command
  Alert Command = "/etc/bacula/scripts/tapealert %l"

  # Make disk spooling a bit more polite
  # We use the SSD because it's much faster than the ZFS arrays
# 2021-07-04 use the spool SSD to avoid burning write cycles on the root SSD and lagging the machine out. set 100GB job spool size.
  # 2025-10-08 increase max spool size to 400GB
  Maximum Spool Size = 400G
  Maximum Job Spool Size = 100G
  Spool Directory = "/mnt/scratch/bacula/lto"
}


> Also, please let us know if your jobs are running concurrently. Are all your jobs outputting to disk volumes, or directly to tape? (I know you mentioned tape performance, but I guess it's possible some jobs are writing to disk volumes).

I'm running a maximum of two jobs concurrently.

They output to a SAS SSD as spool (attributes and data) then the SSD is spooled to the tape. This is about the only way I've found to stop the LTO from starving; the shoeshining (stopping, seeking back, starting again) from spooling direct makes the backup take even longer.

> I suppose you could create a test job and a more limited test fileset for your desktop.

It doesn't seem to matter how much I back up -- the tape unspooling is as fast as it ever was (maxes out the drive at ~100MB/s) but the transfer from the FD to the Director/SD seems to be the slow part.

> If writing to tape, you might want to make a pool for > this and dedicate a tape to these tests, so this data isn't mixed in > with your current data. Generate a bunch of large, incompressible files (1GB, maybe 20 files?). Run a full backup. What backup speeds do you see?

I'll look into that, but I don't think I have a spare LTO6 tape - I'll need to get some more.

> Here are the immediate possible bottleneck sources I can think of: Desktop CPU, Desktop storage, Desktop <-> NAS network link, NAS storage, and/or bacula database.

I doubt there are any major issues with the CPUs, the desktop is running a Ryzen 5 5600X, the backup server is an i5-9400.

The database is PostgreSQL - it used to be MySQL but I migrated a few months ago.

> Desktop CPU: check top while a backup is running from the desktop to the NAS. Do you see a single 'bacula-fd' process maxing out a CPU core?

I think I checked that and I don't recall it doing that.

> Desktop storage: Try making a tar of some of your files, dumping the output to /dev/null. For bonus points, try dumping the tar to the NAS, but I'd still want to see the performance numbers when the output is / dev/null. Prefer large, non-compressible files. Can use hyperfine to time / benchmark this.

philpem@cheetah:/mnt/zfs$ tar c archive-disks | pv > /dev/null
3.07GiB 0:00:19 [73.4MiB/s]

It hovers around 70MB/s, peaks at 200, troughs at 30. More or less the same performance numbers as the NAS.

> Desktop <-> NAS network link: As a basic first step, try iperf3 tests in both directions between the desktop and the nas. What numbers are we seeing?

Nearly a gigabit per second.

> NAS storage: We're already seeing 100+MB/s unspool rates from SSD to tape, but what write speeds do we see to the NAS (spinning?) disks? I would want to test both internally to the NAS (dd from /dev/urandom to a file, compare with dd from /dev/zero to a file), and from the disk on the NAS to a fast destination, probably dd or tar existing files, dumping output to /dev/null. Bonus points if the test files in the disk >> fast storage test are large (1GB), and contain random incompressible data.

$ sudo dd if=/dev/urandom of=/mnt/scratch/tmp bs=1G count=1 status=progress
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 5 s, 221 MB/s
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 4.86835 s, 221 MB/s

"/mnt/scratch" is the SSD.

It's a bit faster writing nulls:

philpem@syrys:/mnt/scratch$ sudo dd if=/dev/zero of=/mnt/scratch/tmp bs=1G count=1 status=progress
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 2 s, 449 MB/s
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 2.39124 s, 449 MB/s

Real numbers are probably somewhere in the middle.


Thanks.
--
Phil.
[email protected]
https://www.philpem.me.uk/


_______________________________________________
Bacula-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to