Interesting.  I remember being confused about which `Maximum Concurrent Jobs` to use (the `bacula-fd.conf` vs the director's config), but your explanation makes a lot of sense.  I hadn't realized/considered the implications of having it set too low in the FD's config.

I'll test it and if the problem recurs over the next few days, I'll post here again.

Thank you

Lloyd

On 5/21/25 09:39, Bill Arlofski via Bacula-users wrote:
On 5/20/25 3:20 PM, Lloyd Brown wrote:
Hi all,

I'm running into an issue with some bacula-fd instances and hoping someone can point me in the right direction.

In short: I have bacula-fd instances that are clearly running jobs (confirmed via strace), but they often time out when I run status client=CLIENTNAME. They only seem reliably responsive when idle.

Details:

   *     Bacula version: 9.6.6 (yes, I know it's old — upgrade is planned).    *     Setup: Two hosts (`zhomebackup[1-2]`) running both SD and FD. A script at the beginning of each job snapshots NFS
    shares, mounts them, and outputs file paths for backups.
   *     Problem: These hosts struggle to handle more than 6–7 jobs effectively. Going beyond that causes a drop in aggregate
    file scan rates.
   *     Attempted solution: Spun up additional FD instances on separate ports (originally inside Docker, but now just     running natively on non-standard ports). These new instances are /intermittently/ responsive to `status client`, even     with only 1–3 jobs. The original FD (on the default port) remains responsive, even with 6–7 jobs.

I'm wondering if this could be a shared resource issue or some FD limitation I'm not accounting for. Or is there a better way to scale job throughput?

I've attached a tarball containing systemd service files, FD configs, and relevant parts of the Director config, including an example job definition.

Any insights would be greatly appreciated.

Thanks,
Lloyd

Hello Lloyd,

For Bacula, each connection is counted as a 'Job'

This means on your FDs, once three jobs are running (six for the first one), the FD will not accept the new "job" connection for the `status client` which will appear as the symptom you are describing.

MaximumConcurrentJobs settings grepped from your configs:
----8<----
$ grep -ir maximum zhomebackup1/etc/bacula/bacula-fd.con*
zhomebackup1/etc/bacula/bacula-fd.conf:        Maximum Concurrent Jobs = 6 zhomebackup1/etc/bacula/bacula-fd.container1.conf:        #Maximum Concurrent Jobs = 20 zhomebackup1/etc/bacula/bacula-fd.container1.conf:        Maximum Concurrent Jobs = 3 zhomebackup1/etc/bacula/bacula-fd.container2.conf:        #Maximum Concurrent Jobs = 20 zhomebackup1/etc/bacula/bacula-fd.container2.conf:        Maximum Concurrent Jobs = 3 zhomebackup1/etc/bacula/bacula-fd.container3.conf:        #Maximum Concurrent Jobs = 20 zhomebackup1/etc/bacula/bacula-fd.container3.conf:        Maximum Concurrent Jobs = 3
----8<----

Just increase these FD settings, restart each FD and you should be fine.

A setting of `MaximumConcurrentJobs = 20` as shipped with the default/example configs is a good starting point for the FDs, and then you can manage the actual number of concurrent jobs triggered on each of the clients with the MaximumConcurrentJobs setting in the Client{} configurations on the Director which can then be adjusted up or down without requiring a restart of the FDs - just a bconsole 'reload' command is needed for the Director to pick up these changes.


Hope this helps,
Bill



_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

--
Lloyd Brown
HPC Systems Administrator
Office of Research Computing
Brigham Young University
http://rc.byu.edu



_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to