Hi all, I’m running Bacula Community Edition 15.0.2 and I’m trying to improve Copy and Restore job throughput between two on-prem Storage Daemons (Hetzner bare metal servers) connected via 10 Gbps links.
I’ve been investigating this for a while but haven’t been able to pinpoint what’s limiting throughput, so I’m sharing the data here to see if anyone has run into something similar. Setup summary: - Director and main Storage Daemon on the same host - Remote Storage Daemon on a separate host - Both servers running Bacula CE 15.0.2 - Mostly default configuration - Maximum Concurrent Jobs = 100 - ~35 parallel CopyJobs during the tests - Maximum Network Buffer Size: default - Minimum block size: default Hardware / network: - Both servers are Hetzner bare metal - 10 Gbps NICs on both sides - Servers are in different datacenters (routed) - No NIC errors, no packet drops, no CPU saturation observed Observed behavior: During large CopyJobs and RestoreJobs (main SD → remote SD), aggregate throughput consistently plateaus around 128–150 MiB/s (~1–1.2 Gbps). Once this level is reached, CopyJobs start queueing and throughput does not scale further, even with ~35 concurrent jobs. CPU, disk I/O, and network all appear to have plenty of headroom. Network validation: To rule out network limitations, I ran aggressive iperf3 tests. From three different servers, I ran: iperf3 -c remote-sd-server.io -t 300 -P 32 On the remote Storage Daemon, I ran multiple listeners on different ports. Aggregate inbound traffic on the remote SD easily exceeds Bacula traffic, sustaining multi-Gbps throughput. iperf traffic scales well beyond 1 Gbps and coexists without issues alongside Bacula jobs, which remain flat around ~1 Gbps. Disk I/O validation (remote Storage Daemon): The remote SD uses 14 × 20 TB 7200 RPM SATA disks with an LVM volume mounted at /data. fio direct-I/O tests show: - Single job: ~246 MiB/s sequential write - 8 parallel jobs: ~915 MiB/s sustained write - 16 parallel jobs: ~1014 MiB/s sustained write This puts disk throughput well above the ~150 MiB/s observed with Bacula. Full fio output is available here: https://gist.github.com/LeandroSaldivarmrf/6b1a354f845f4afb26a2fa39183e269b For additional context, I also posted the same findings along with a couple of network and CPU usage graphs here: https://community.spiceworks.com/t/bacula-ce-15-0-2-copyjobs-capped-at-1-gbps-despite-10gbps-network/1248466 At this point, network, disk, and CPU do not appear to be the limiting factors, which makes me suspect a Bacula CopyJob-level limitation or missing tuning. Questions: - Has anyone consistently achieved multi-Gbps (5–10 Gbps) CopyJob throughput with Bacula CE? - Are there Director or Storage Daemon tuning parameters that significantly affect CopyJob performance? - Would running multiple Directors help here, or should a single Director be able to drive higher throughput? If there’s any specific configuration you’d like me to share (Director, Storage resources, JobDefs, etc.), I can post it. Thanks in advance.
_______________________________________________ Bacula-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/bacula-users
