Re: [Bacula-users] copy job performance problem

Josh Fisher via Bacula-users Sat, 15 Nov 2025 05:57:03 -0800


On 11/14/25 05:42, Chris Wright wrote:

Hello all,
I'm seeing some unexpectedly slow performance when testing a copy jobprocess and I've pretty much run out of ideas on diagnosing it.
We are currently running two bacula storage daemons, on different VMs,and have been attempting to use copy jobs to take a copy of backupsoffsite.
SD1 is storing 100GiB volumes on a HDD backed Ceph pool (via filevolumes on CephFS) - we have ~130 TiB of backups (no compression) in100 GiB volume files, multiple jobs were run in parallel onto thevolumes and we were getting >100 MiB/second write throughput (mostlyclient limited).

Multiple jobs in parallel wrote to the same volume? If so, then thosejobs' data blocks are all interleaved with each other. The originalparallel writes are not affected, but the copy job that is reading theinterleaved records is constantly seeking and thrashing the HDread/write heads back and forth. You could write each job to a separatevolume file, instead of having fixed size volumes.

SD2 is a Cloud store, using the S3 driver to push volumes up toS3/Glacier on a fast connection with a local SSD cache.
SD1 and SD2 are on the same 10 GiB switch, both have beenrecently upgraded to bacula 15.0.3 and both are on reasonably modernCPUs (AMD EPYC 9124 for SD1).
When we run a copy job we are seeing:
 - Expected backup jobs spawn
 - SD1 & SD2 connect to each other fine
- SD1 mounts a volume file and starts streaming data to SD2, withreasonable throughput (50 - 100 MiB/sec)
all seems well for a time then throughput drops to essentially zero
- SD1 will have a single CPU pegged at 100%, with minimal IO traffic(both ops and bandwidth) from the open volume file, we will get spikesof good speed but average throughput after leaving a job running for aweek is <1 MiB/sec. - SD2 is quiet, happily handling normal backup jobs from otherclients with normal performance
If we start a second, parallel, copy job we get similar initially goodthroughput then peg a second CPU on SD1 to 100% but there isn'texactly a big jump in performance.
There are no warnings/errors being logged and everything appears to be"working", just glacially slow and apparently totally bottlenecked onwhatever that single CPU thread is doing with minimal reads from thevolumes.
Any suggestions on where to look for the root cause here?

Thanks
--

Chris Wright

Application Software Developer

<http://www.maglabs.net>
T: 0203 515 1000 | www.maglabs.net <http://www.maglabs.net> | Followus <https://bit.ly/3x215vn>
MagLabs Limited is a Limited Liability Company registered at CompaniesHouse, Cardiff. Registration No 06715580.DISCLAIMER: This email and any attachments sent with it may containconfidential and legally privileged information. It is intended solelyfor the individual or entity to whom the email was addressed. If youare not the intended recipient please notify the sender via emailimmediately, delete the email (and attachments) from your computersystem and destroy any copies you may have in your possession. You areprohibited from using, printing, copying or disclosing any of theinformation contained within the email and its attachment(s). MagLabsLimited does not accept any responsibility or liability for anychanges made to this email after it was sent or for any virusestransmitted through it. Opinions, comments, and conclusions made inthis email may be that of the author and may not reflect the view ofMagLabs Limited.
Please consider the environment before printing this email



_______________________________________________
Bacula-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-users

_______________________________________________
Bacula-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-users

Re: [Bacula-users] copy job performance problem

Reply via email to