I think it is great support here from you people! Today I think I might have understood what is happening, and Bill’s explanations about what might be going on were probably correct in the core, but not in the details.
Let me try to lay out what I think is going on and where I had my problem understanding it in the first place: After an update of syslog-ng the syslogging of the FD client host started to work (it was configured a long time ago but somehow it never worked before the update for the syslog-ng server that came in the last days) I began to see where and WHEN(!) the error messages originated. It is - as you guys are saying - the FD generating these errors, which are logged without delay in my central syslog-ng server: 2022-07-25 12:57:30 bsockcore.c:265 Unable to connect to Director daemon on bacula-dir.lan.net:9101. ERR=Connection refused The eye-opener were the timestamps, which explained what is happening (more on that later). My problem so far was that the error messages shown in Baculum had the timestamp of the Director when the Director sees the error messages, not when they happened! 25-Jul 22:00 bacula-dir JobId 1725: Error: getmsg.c:217 Malformed message: [bsockcore.c:265 Unable to connect to Director daemon on bacula-dir.lan.net:9101. ERR=Connection refused Note the different timestamp. In the first message it is the timestamp of the FD client host when the error occurs there. In the second message you see the timestamp of the Director host when the first error message gets delivered from the FD to the Director. So what you guys said is correct: the Director accepts the error messaged from the FD only when a job runs for the FD. Even if the FD connects to the Director many times during the day, the error messages are held back by the FD until a job actually runs and then they are ingested for the first job that runs on the current day. This also explains why there are no errors when a similar job runs shortly after to backup to the other tier storage Because so far I was only seeing the Director timestamp I was misled that the error actually happens at the time when the job runs. I now understand that this is not correct, and I think you guys also mentioned it, but I didn’t pick it up consciously enough to understand what this means. Now that I can see the timestamp from the FD when the errors actually happen on the FD host I can now confirm: (1) the Director is definitely reachable for the FD at the time when the job runs (as I alway also stated), this is why the error messages show the timestamp of when the job runs, as it always is able to run due to availability of the Director. (2) the Director is NOT reachable at some scheduled times each day when the contained is shut down for third party backup reasons (the firewall has nothing to do with this). And this is the time frame when the errors actually occur and can now be seen in syslog-ng. I suppose if I now schedule the FD only connect to the Director when the job runs, the errors should go away. I will try this and report back. One last thing is still unclear to me. Today I saw 455 connection errors in the Baculum Messages window, but only 38 connection errors in syslog-ng. This is weird, as I am (1) using syslog over TCP, and (2) I think I should see a higher or the same number of connection errors in syslog-ng as compared to in Baculum Messages window. However it is the over way around and considerably more errors on the Director side than on the FD side (syslog). Can this be explained? All the best, J/C > On 25. Jul 2022, at 18:04, Martin Simmons <mar...@lispworks.com> wrote: > >>>>>> On Mon, 25 Jul 2022 15:50:15 +0000, Bill Arlofski said: >> >> On Monday, July 25th, 2022 at 08:54, Martin Simmons <mar...@lispworks.com>= >> wrote: >>> >>> You could try running bacula-fd with debugging output. Unfortunately, >>> it doesn't include timestamps, but you can do it like this: >> >> Hey Martin, Not sure if this is recent or not, but: >> ----8<---- >> $ /opt/comm-bacula/sbin/bacula-fd -? >> Copyright (C) 2000-2022 Kern Sibbald. >> >> Version: 13.0.0 (04 July 2022) >> >> Usage: bacula-fd [-f -s] [-c config_file] [-d debug_level] >> -c <file> use <file> as configuration file >> -d <n>[,<tags>] set debug level to <nn>, debug tags to <tags> >> >> -dt print a timestamp in debug output <--= >> -- TimeStamps >> >> -f run in foreground (for debugging) >> -g groupid >> -k keep readall capabilities >> -m print kaboom output (for debugging) >> -P do not create pid file >> -s no signals (for debugging) >> -t test configuration file and exit >> -T set trace on >> -u userid >> -v verbose user messages >> -? print this message. >> ----8<---- > > Thanks, I didn't know that. > > So this will be simpler: > > bacula-fd -dt -d50,scheduler -f -v ...your normal bacula-fd args... > > __Martin > > > _______________________________________________ > Bacula-users mailing list > Bacula-users@lists.sourceforge.net <mailto:Bacula-users@lists.sourceforge.net> > https://lists.sourceforge.net/lists/listinfo/bacula-users > <https://lists.sourceforge.net/lists/listinfo/bacula-users>
_______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users