Hi Martin please find my response inline: > On 20. Jul 2022, at 20:39, Martin Simmons <mar...@lispworks.com> wrote: > >>>>>> On Wed, 20 Jul 2022 16:25:15 +0200, Justin Case said: >> >>> On 20. Jul 2022, at 12:06, Martin Simmons <mar...@lispworks.com> wrote: >>> >>> There are two levels of error here: firstly, the "Connection refused" occurs >>> when the client tries to connect to the Director; then that eventually >>> succeeds and the error message is sent back to the Director when it doesn't >>> expect it which leads to the "Malformed message" error. >>> >>> Does the client machine have permanent access to the Director machine (via >>> the >>> firewall)? It looks to me like it might only have intermittent access, so >>> gets "Connection refused" for some time before it connects. >> >> It has permanent access. However, the machine running the Dir and SD is busy >> with local FD jobs (as in “Dir, SD and FD on same host”) when also the jobs >> for remote FDs are initiated. The jobs I had so far for remote hosts just >> waited with a cogwheel showing in Baculum. The job for the FD with >> ConnectToDirector also shows the cogwheel, but it seems at the point in time >> when its turn has come, the connection messages are occuring. > > It is not clear when the "Connection refused" messages actually occurred, > because they are queued until a connection exists and then sent to that > connection. They probably occurred sometime before the connection existed.
I see. > "Connection refused" comes from the OS errno ECONNREFUSED, which suggests a > genuine networking problem at the point in time. It usually means that the > server is not listening on the target port, or some firewall is blocking it. AFAIK the server is (a) running during the time when the jobs are started (obviously), (b) the firewall block the Director from connecting to the FD. The FD may connect to the Director any time. I have no idea whether the Director is listening permanently, though. > Since you don't have a Schedule clause in the bacula-fd.conf, the bacula-fd > will try to connect to the Director when the bacula-fd starts. It will try to > connect every 5 seconds until it succeeds. Why every 5 seconds? The FD config has ReconnectionTime = 40 min Let me remove this clause and restart the FD. Maybe it helps. > Maybe it can't connect at that point for some reason? I doubt it. AFAIK there is nothing there that would prevent this. What I can imagine is that at some point in time the TCP connection from the FD to the Dir stops working (maybe some timeout in a firewall table?) and the FD only tried to reconnect in a 40 min interval. This could explain the problem. I have now removed the ReconnectionTime = 40 min clause. If that does not help I will put in a schedule on the FD for the connection. If none of this helps I will downgrade the FD from 13 to 11 (as the Dir is on 11). The curious thing is this: If I trigger the job manually at any point in time, I just works without any of the connection error messages. > The problem is repeatable for me if I start the bacula-fd before starting the > bacula-dir. > > __Martin
_______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users