Hi all,

I’m running Bacula Community Edition 15.0.2 and I’m trying to improve Copy
and Restore job throughput between two on-prem Storage Daemons (Hetzner
bare metal servers) connected via 10 Gbps links.

I’ve been investigating this for a while but haven’t been able to pinpoint
what’s limiting throughput, so I’m sharing the data here to see if anyone
has run into something similar.

Setup summary:

   -

   Director and main Storage Daemon on the same host
   -

   Remote Storage Daemon on a separate host
   -

   Both servers running Bacula CE 15.0.2
   -

   Mostly default configuration
   -

   Maximum Concurrent Jobs = 100
   -

   ~35 parallel CopyJobs during the tests
   -

   Maximum Network Buffer Size: default
   -

   Minimum block size: default

Hardware / network:

   -

   Both servers are Hetzner bare metal
   -

   10 Gbps NICs on both sides
   -

   Servers are in different datacenters (routed)
   -

   No NIC errors, no packet drops, no CPU saturation observed

Observed behavior:

During large CopyJobs and RestoreJobs (main SD → remote SD), aggregate
throughput consistently plateaus around 128–150 MiB/s (~1–1.2 Gbps). Once
this level is reached, CopyJobs start queueing and throughput does not
scale further, even with ~35 concurrent jobs. CPU, disk I/O, and network
all appear to have plenty of headroom.

Network validation:

To rule out network limitations, I ran aggressive iperf3 tests. From three
different servers, I ran:

iperf3 -c remote-sd-server.io -t 300 -P 32

On the remote Storage Daemon, I ran multiple listeners on different ports.
Aggregate inbound traffic on the remote SD easily exceeds Bacula traffic,
sustaining multi-Gbps throughput. iperf traffic scales well beyond 1 Gbps
and coexists without issues alongside Bacula jobs, which remain flat around
~1 Gbps.

Disk I/O validation (remote Storage Daemon):

The remote SD uses 14 × 20 TB 7200 RPM SATA disks with an LVM volume
mounted at /data. fio direct-I/O tests show:

   -

   Single job: ~246 MiB/s sequential write
   -

   8 parallel jobs: ~915 MiB/s sustained write
   -

   16 parallel jobs: ~1014 MiB/s sustained write

This puts disk throughput well above the ~150 MiB/s observed with Bacula.
Full fio output is available here:

https://gist.github.com/LeandroSaldivarmrf/6b1a354f845f4afb26a2fa39183e269b

For additional context, I also posted the same findings along with a couple
of network and CPU usage graphs here:

https://community.spiceworks.com/t/bacula-ce-15-0-2-copyjobs-capped-at-1-gbps-despite-10gbps-network/1248466

At this point, network, disk, and CPU do not appear to be the limiting
factors, which makes me suspect a Bacula CopyJob-level limitation or
missing tuning.

Questions:

   -

   Has anyone consistently achieved multi-Gbps (5–10 Gbps) CopyJob
   throughput with Bacula CE?
   -

   Are there Director or Storage Daemon tuning parameters that
   significantly affect CopyJob performance?
   -

   Would running multiple Directors help here, or should a single Director
   be able to drive higher throughput?

If there’s any specific configuration you’d like me to share (Director,
Storage resources, JobDefs, etc.), I can post it.

Thanks in advance.
_______________________________________________
Bacula-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to