Interesting. I remember being confused about which `Maximum Concurrent
Jobs` to use (the `bacula-fd.conf` vs the director's config), but your
explanation makes a lot of sense. I hadn't realized/considered the
implications of having it set too low in the FD's config.
I'll test it and if the problem recurs over the next few days, I'll post
here again.
Thank you
Lloyd
On 5/21/25 09:39, Bill Arlofski via Bacula-users wrote:
On 5/20/25 3:20 PM, Lloyd Brown wrote:
Hi all,
I'm running into an issue with some bacula-fd instances and hoping
someone can point me in the right direction.
In short: I have bacula-fd instances that are clearly running jobs
(confirmed via strace), but they often time out when I run status
client=CLIENTNAME. They only seem reliably responsive when idle.
Details:
* Bacula version: 9.6.6 (yes, I know it's old — upgrade is
planned).
* Setup: Two hosts (`zhomebackup[1-2]`) running both SD and
FD. A script at the beginning of each job snapshots NFS
shares, mounts them, and outputs file paths for backups.
* Problem: These hosts struggle to handle more than 6–7 jobs
effectively. Going beyond that causes a drop in aggregate
file scan rates.
* Attempted solution: Spun up additional FD instances on
separate ports (originally inside Docker, but now just
running natively on non-standard ports). These new instances are
/intermittently/ responsive to `status client`, even
with only 1–3 jobs. The original FD (on the default port) remains
responsive, even with 6–7 jobs.
I'm wondering if this could be a shared resource issue or some FD
limitation I'm not accounting for. Or is there a better way to scale
job throughput?
I've attached a tarball containing systemd service files, FD configs,
and relevant parts of the Director config, including an example job
definition.
Any insights would be greatly appreciated.
Thanks,
Lloyd
Hello Lloyd,
For Bacula, each connection is counted as a 'Job'
This means on your FDs, once three jobs are running (six for the first
one), the FD will not accept the new "job" connection for the `status
client` which will appear as the symptom you are describing.
MaximumConcurrentJobs settings grepped from your configs:
----8<----
$ grep -ir maximum zhomebackup1/etc/bacula/bacula-fd.con*
zhomebackup1/etc/bacula/bacula-fd.conf: Maximum Concurrent Jobs
= 6
zhomebackup1/etc/bacula/bacula-fd.container1.conf: #Maximum
Concurrent Jobs = 20
zhomebackup1/etc/bacula/bacula-fd.container1.conf: Maximum
Concurrent Jobs = 3
zhomebackup1/etc/bacula/bacula-fd.container2.conf: #Maximum
Concurrent Jobs = 20
zhomebackup1/etc/bacula/bacula-fd.container2.conf: Maximum
Concurrent Jobs = 3
zhomebackup1/etc/bacula/bacula-fd.container3.conf: #Maximum
Concurrent Jobs = 20
zhomebackup1/etc/bacula/bacula-fd.container3.conf: Maximum
Concurrent Jobs = 3
----8<----
Just increase these FD settings, restart each FD and you should be fine.
A setting of `MaximumConcurrentJobs = 20` as shipped with the
default/example configs is a good starting point for the FDs, and then
you can manage the actual number of concurrent jobs triggered on each
of the clients with the MaximumConcurrentJobs setting in the Client{}
configurations on the Director which can then be adjusted up or down
without requiring a restart of the FDs - just a bconsole 'reload'
command is needed for the Director to pick up these changes.
Hope this helps,
Bill
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users
--
Lloyd Brown
HPC Systems Administrator
Office of Research Computing
Brigham Young University
http://rc.byu.edu
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users