Re: [Bacula-users] Poor network backup performance

Phil Pemberton via Bacula-users Mon, 24 Nov 2025 04:19:27 -0800


On 21/11/2025 16:28, Rob Gerber wrote:
> Phil,
>
> I have a grab-bag of thoughts.
>
> In my mind, there are 3 basic possibilities:

> 1. Bacula is configured in a way that constrains performance.(bandwidth limit imposed in bacula config, compression is chewing up allthe CPU cycles, multiple concurrent jobs are using all the availableresources, etc)> 2. Something about the systems in question are constrained inperformance. (bad network link, failing hard drive, etc)


Network link is fine according to Iperf3:

[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   115 MBytes   966 Mbits/sec    0    365 KBytes
[  5]   1.00-2.00   sec   112 MBytes   940 Mbits/sec    0    383 KBytes
[  5]   2.00-3.00   sec   112 MBytes   937 Mbits/sec    0    393 KBytes
[  5]   3.00-4.00   sec   113 MBytes   949 Mbits/sec    0    440 KBytes
[  5]   4.00-5.00   sec   111 MBytes   931 Mbits/sec    0    474 KBytes

Drives test fine on both ends (SMART and hdparm speed test).

Client disks --
 Timing cached reads:   43360 MB in  1.99 seconds = 21781.17 MB/sec
 Timing buffered disk reads: 590 MB in  3.00 seconds = 196.60 MB/sec

Server spool disk (Intel SAS SSD) --
 Timing cached reads:   30266 MB in  1.98 seconds = 15263.37 MB/sec
 Timing buffered disk reads: 1286 MB in  3.00 seconds = 428.45 MB/sec

> 3. Bacula is malfunctioning (no examples come to mind, but it isn'timpossible)

>
> Bacula configuration:

> From your bacula-dir.conf, please show us the relevant job, jobdef,storage, pool, schedule, and fileset resources.


Job {
       Name = "Backup-Workstation"
       JobDefs = "DefaultJob"
       Client = cheetah-fd
       Pool = Tape
       FileSet="Cheetah-Fileset"
}

JobDefs {
  Name = "DefaultJob"
  Type = Backup
  Level = Incremental
  Client = syrys-fd
  Schedule = "WeeklyCycle"
  Storage = Tape
  Messages = Standard
  Pool = Default
  Priority = 9
  Write Bootstrap = "/var/lib/bacula/%c.bsr"

# Cancellation options are documented in Fig 12.2 athttps://www.bacula.org/5.2.x-manuals/en/main/main/Configuring_Director.html

  Allow Duplicate Jobs = no
  Cancel Lower Level Duplicates = yes
  Cancel Queued Duplicates = yes
  Cancel Running Duplicates = no
  # only allow this job to run once concurrently
  #Maximum Concurrent Jobs = 0
  Spool Data = yes
  Spool Attributes = yes
}

# Definition of LTO-6 tape storage device
Storage {
  Name = Tape
  Address = REDACTED
  SDPort = 9103
  Password = REDACTED

Device = Superloader3 # must be same as Devicein Storage daemonMedia Type = LTO-6 # must be same as MediaType inStorage daemon

  Autochanger = yes                   # enable for autochanger device

AllowCompression = no # use the LTO drive's hardware compression(PP 2018-10-27)# note - this overrides software compression tooff.

                        # 'mt defcompression' sets compression on/off
  Maximum Concurrent Jobs = 4   # added 2020-01-19
}

# LTO tape pool definition
Pool {
  Name = Tape
  Pool Type = Backup

# If a label format is not specified, Bacula will not attempt toautomatically label tapes

#  Label Format = BAK

Recycle = yes # Bacula can automaticallyrecycle Volumes

  AutoPrune = yes                     # Prune expired volumes

# Volume Retention = 13 days # Keep at least a fortnightworth of old data# Volume Retention = 0 day # PP 2018-10-27 allow Bacula to recycleessentially any volume

  Volume Retention = 12 hour   # PP 2025-11-06 because this wasn't working
  Recycle Oldest Volume = yes         # If we absolutely can't find a tape,
                                      # recycle the oldest one
}

# When to do the backups, full backup on first sunday of the month,
#  differential (i.e. incremental since full) every other sunday,
#  and incremental backups other days
Schedule {
  Name = "WeeklyCycle"
  Run = Full 1st sun at 01:05
  Run = Differential 2nd-5th sun at 01:05
  Run = Incremental mon-sat at 01:05
}

FileSet {
        Name = "Cheetah-Fileset"
        Include {
                Options {
                        signature = MD5
                        compression = LZO

# Allow descending into different FSes (neededfor ZFS)

                        onefs = no
                }

                File = /mnt/zfs
                File = /home
        }

        Exclude {
                File = /mnt/zfs/video-temp
                File = /mnt/zfs/nextcloud
                File = /mnt/zfs/steam
        }
}


> From bacula-sd.conf,
> please show us the device and autochanger resources that the impacted
> jobs output to.

# Superloader3 configuration based onhttp://www.backupcentral.com/phpBB2/two-way-mirrors-of-external-mailing-lists-3/bacula-25/dell-pv-124t-autochanger-configuration-98158/

Autochanger {
  Name = Superloader3
  Device = Superloader3-1
  #Changer Command = "/etc/bacula/scripts/mtx-changer %c %o %S %a %d"
  Changer Command = "/etc/bacula/scripts/local-tapechanger %c %o %S %a %d"
  Changer Device = /dev/changer
}

Device {
  Name = Superloader3-1
  Drive Index = 0
  Media Type = LTO-6
  Archive Device = /dev/tape_nst
  Control Device = /dev/tape_sg         # added 2020-01-10
  AutomaticMount = yes;               # when device opened, read it
  AlwaysOpen = yes;
  RemovableMedia = yes;
  RandomAccess = no;

Maximum File Size = 12GB # increased from 5 to 12GB2020-01-19

# buffer sizes fromhttp://www.backupcentral.com/forum/19/221466/tuning_lto-4 -- 2020-01-19

  Maximum Network Buffer Size = 65536

# may reduce shoeshine, seehttps://www.bacula.org/9.4.x-manuals/en/main/Storage_Daemon_Configuratio.html

  Maximum block size = 2M
  # end 2020-01-19

  # Don't let Bacula autolabel tapes
  LabelMedia = no;

  # Hooray for autochangers!
#  Changer Command = "/etc/bacula/scripts/mtx-changer %c %o %S %a %d"
#  Changer Device = /dev/changer
  AutoChanger = yes

  # Enable the Alert command only if you have the mtx package loaded
  # Alert Command = "sh -c 'tapeinfo -f %c |grep TapeAlert|cat'"
  # If you have smartctl, enable this, it has more info than tapeinfo
  # Alert Command = "sh -c 'smartctl -H -l error %c'"

# This does a few things:

# 1) Resolve the tape_sg symlink our Udev rules set up (in case the sgdevice name changes) -- smartctl can't handle symlinks

#   2) Poke smartctl to get the TapeAlert and read/write error data

# This is a little weird because %c gives the tape robot's device node,and that only supports the tape movement commands.# Alert Command = "sh -c 'smartctl -H -l error `readlink -f/dev/tape_sg`'"



  # added 2020-01-19 nd disabled the above smartctl command
  Alert Command = "/etc/bacula/scripts/tapealert %l"

  # Make disk spooling a bit more polite
  # We use the SSD because it's much faster than the ZFS arrays

# 2021-07-04 use the spool SSD to avoid burning write cycles on theroot SSD and lagging the machine out. set 100GB job spool size.

  # 2025-10-08 increase max spool size to 400GB
  Maximum Spool Size = 400G
  Maximum Job Spool Size = 100G
  Spool Directory = "/mnt/scratch/bacula/lto"
}

> Also, please let us know if your jobs are running concurrently. Areall your jobs outputting to disk volumes, or directly to tape? (I knowyou mentioned tape performance, but I guess it's possible some jobs arewriting to disk volumes).


I'm running a maximum of two jobs concurrently.

They output to a SAS SSD as spool (attributes and data) then the SSD isspooled to the tape. This is about the only way I've found to stop theLTO from starving; the shoeshining (stopping, seeking back, startingagain) from spooling direct makes the backup take even longer.

> I suppose you could create a test job and a more limited test filesetfor your desktop.

It doesn't seem to matter how much I back up -- the tape unspooling isas fast as it ever was (maxes out the drive at ~100MB/s) but thetransfer from the FD to the Director/SD seems to be the slow part.

> If writing to tape, you might want to make a pool for > this anddedicate a tape to these tests, so this data isn't mixed in> with your current data. Generate a bunch of large, incompressiblefiles (1GB, maybe 20 files?). Run a full backup. What backup speeds doyou see?

I'll look into that, but I don't think I have a spare LTO6 tape - I'llneed to get some more.

> Here are the immediate possible bottleneck sources I can think of:Desktop CPU, Desktop storage, Desktop <-> NAS network link, NAS storage,and/or bacula database.

I doubt there are any major issues with the CPUs, the desktop is runninga Ryzen 5 5600X, the backup server is an i5-9400.

The database is PostgreSQL - it used to be MySQL but I migrated a fewmonths ago.

> Desktop CPU: check top while a backup is running from the desktop tothe NAS. Do you see a single 'bacula-fd' process maxing out a CPU core?


I think I checked that and I don't recall it doing that.

> Desktop storage: Try making a tar of some of your files, dumping theoutput to /dev/null. For bonus points, try dumping the tar to the NAS,but I'd still want to see the performance numbers when the output is /dev/null. Prefer large, non-compressible files. Can use hyperfine totime / benchmark this.


philpem@cheetah:/mnt/zfs$ tar c archive-disks | pv > /dev/null
3.07GiB 0:00:19 [73.4MiB/s]

It hovers around 70MB/s, peaks at 200, troughs at 30. More or less thesame performance numbers as the NAS.

> Desktop <-> NAS network link: As a basic first step, try iperf3 testsin both directions between the desktop and the nas. What numbers are weseeing?


Nearly a gigabit per second.

> NAS storage: We're already seeing 100+MB/s unspool rates from SSD totape, but what write speeds do we see to the NAS (spinning?) disks? Iwould want to test both internally to the NAS (dd from /dev/urandom to afile, compare with dd from /dev/zero to a file), and from the disk onthe NAS to a fast destination, probably dd or tar existing files,dumping output to /dev/null. Bonus points if the test files in the disk>> fast storage test are large (1GB), and contain random incompressibledata.


$ sudo dd if=/dev/urandom of=/mnt/scratch/tmp bs=1G count=1 status=progress
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 5 s, 221 MB/s
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 4.86835 s, 221 MB/s

"/mnt/scratch" is the SSD.

It's a bit faster writing nulls:

philpem@syrys:/mnt/scratch$ sudo dd if=/dev/zero of=/mnt/scratch/tmpbs=1G count=1 status=progress

1073741824 bytes (1.1 GB, 1.0 GiB) copied, 2 s, 449 MB/s
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 2.39124 s, 449 MB/s

Real numbers are probably somewhere in the middle.


Thanks.
--
Phil.
[email protected]
https://www.philpem.me.uk/


_______________________________________________
Bacula-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-users

Re: [Bacula-users] Poor network backup performance

Reply via email to