I think it is great support here from  you people!

Today I think I might have understood what is happening, and Bill’s 
explanations about what might be going on were probably correct in the core, 
but not in the details.

Let me try to lay out what I think is going on and where I had my problem 
understanding it in the first place:

After an update of syslog-ng the syslogging of the FD client host started to 
work (it was configured a long time ago but somehow it never worked before the 
update for the syslog-ng server that came in the last days) I began to see 
where and WHEN(!) the error messages originated.

It is - as you guys are saying - the FD generating these errors, which are 
logged without delay in my central syslog-ng server:

2022-07-25 12:57:30     
bsockcore.c:265 Unable to connect to Director daemon on 
bacula-dir.lan.net:9101. ERR=Connection refused

The eye-opener were the timestamps, which explained what is happening (more on 
that later).
My problem so far was that the error messages shown in Baculum had the 
timestamp of the Director when the Director sees the error messages, not when 
they happened!

25-Jul 22:00 bacula-dir JobId 1725: Error: getmsg.c:217 Malformed message: 
[bsockcore.c:265 Unable to connect to Director daemon on 
bacula-dir.lan.net:9101. ERR=Connection refused

Note the different timestamp. In the first message it is the timestamp of the 
FD client host when the error occurs there. In the second message you see the 
timestamp of the Director host when the first error message gets delivered from 
the FD to the Director.

So what you guys said is correct: the Director accepts the error messaged from 
the FD only when a job runs for the FD. Even if the FD connects to the Director 
many times during the day, the error messages are held back by the FD until a 
job actually runs and then they are ingested for the first job that runs on the 
current day. This also explains why there are no errors when a similar job runs 
shortly after to backup to the other tier storage

Because so far I was only seeing the Director timestamp I was misled that the 
error actually happens at the time when the job runs. I now understand that 
this is not correct, and I think you guys also mentioned it, but I didn’t pick 
it up consciously enough to understand what this means.

Now that I can see the timestamp from the FD when the errors actually happen on 
the FD host I can now confirm:

(1) the Director is definitely reachable for the FD at the time when the job 
runs (as I alway also stated), this is why the error messages show the 
timestamp of when the job runs, as it always is able to run due to availability 
of the Director.

(2) the Director is NOT reachable at some scheduled times each day when the 
contained is shut down for third party backup reasons (the firewall has nothing 
to do with this). And this is the time frame when the errors actually occur and 
can now be seen in syslog-ng.

I suppose if I now schedule the FD only connect to the Director when the job 
runs, the errors should go away. I will try this and report back.

One last thing is still unclear to me. Today I saw 455 connection errors in the 
Baculum Messages window, but only 38 connection errors in syslog-ng. This is 
weird, as I am (1) using syslog over TCP, and (2) I think I should see a higher 
or the same number of connection errors in syslog-ng as compared to in Baculum 
Messages window. However it is the over way around and considerably more errors 
on the Director side than on the FD side (syslog).
Can this be explained?

All the best,
 J/C
 

> On 25. Jul 2022, at 18:04, Martin Simmons <mar...@lispworks.com> wrote:
> 
>>>>>> On Mon, 25 Jul 2022 15:50:15 +0000, Bill Arlofski said:
>> 
>> On Monday, July 25th, 2022 at 08:54, Martin Simmons <mar...@lispworks.com>=
>> wrote:
>>> 
>>> You could try running bacula-fd with debugging output. Unfortunately,
>>> it doesn't include timestamps, but you can do it like this:
>> 
>> Hey Martin, Not sure if this is recent or not, but:
>> ----8<----
>> $ /opt/comm-bacula/sbin/bacula-fd -?
>> Copyright (C) 2000-2022 Kern Sibbald.
>> 
>> Version: 13.0.0 (04 July 2022)
>> 
>> Usage: bacula-fd [-f -s] [-c config_file] [-d debug_level]
>>     -c <file>        use <file> as configuration file
>>     -d <n>[,<tags>]  set debug level to <nn>, debug tags to <tags>
>> 
>>     -dt              print a timestamp in debug output                <--=
>> -- TimeStamps
>> 
>>     -f               run in foreground (for debugging)
>>     -g               groupid
>>     -k               keep readall capabilities
>>     -m               print kaboom output (for debugging)
>>     -P               do not create pid file
>>     -s               no signals (for debugging)
>>     -t               test configuration file and exit
>>     -T               set trace on
>>     -u               userid
>>     -v               verbose user messages
>>     -?               print this message.
>> ----8<----
> 
> Thanks, I didn't know that.
> 
> So this will be simpler:
> 
> bacula-fd -dt -d50,scheduler -f -v ...your normal bacula-fd args...
> 
> __Martin
> 
> 
> _______________________________________________
> Bacula-users mailing list
> Bacula-users@lists.sourceforge.net <mailto:Bacula-users@lists.sourceforge.net>
> https://lists.sourceforge.net/lists/listinfo/bacula-users 
> <https://lists.sourceforge.net/lists/listinfo/bacula-users>
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to