On 2020-10-01 10:47, Radosław Korzeniewski wrote:
Hello,

I have an impression that you and I are talking about two totally
distinctive things, so to progress with our discussion all things
have to be cleared out.

It looks like it but not completely.
I hope it will be more clear after my answers to the questions below.

I am sorry for taking your time here but at least we will all learn
something. :-)


śr., 30 wrz 2020 o 14:04 <djosip+n...@linuxpages.net> napisał(a):
Storage daemon sends it to the Director which sends it to the user
after it combines the messages if it finds the corresponding Job IDs.


Yes, it is true to some extent.

Hm, I believe it happens in all cases.
In what case it wouldn't be true?

Here is an example of the only Messages resource in my Storage daemon
configuration and most people have the same configuration as it's
sound and even comes within the default configuration files:

Messages {
  Name = Standard
  director = bkp1-dir = all
}


And the bkp1-dir Director has its own Messages resource named
Standard (this is the default name and I chose to keep it) which
has the directives such as mailcommand, operatorcommand, mail,
operator, console, append and catalog defined.


The messages in the Catalog are not stored by the Storage daemon and
as far as I know, they are not used for anything else but for manual
browsing by some tools like BAT.


So, below is an example of the copied job log I have on hand:

2020-05-03 01:14:25 backup-dir JobId 167991: Start Backup JobId 167991,
Job=xxx.2020-05-03_01.00.00_30
2020-05-03 01:14:30 backup-dir JobId 167991: Using Device "File-Drive-01"
to write.
2020-05-03 01:14:30 backup-dir JobId 167991: Using Volume "vol2292" from
'Scratch' pool.
2020-05-03 01:14:30 backup-dir JobId 167991: Recycled volume "vol2292"
2020-05-03 01:14:35 backup-dir JobId 167991: Max Volume jobs=1 exceeded.
Marking Volume "vol2292" as Used.
2020-05-03 01:14:35 backup-sd JobId 167991: Recycled volume "vol2292" on
File device "File-Drive-01" (/backup/volumes), all previous data lost.
2020-05-03 01:14:37 backup-fd JobId 167991: Backup "xxx" (vm-240) start.
2020-05-03 01:15:05 backup-fd JobId 167991: There are 2 disks.
2020-05-03 01:15:05 backup-fd JobId 167991: create snapshot name
xxx.2020-05-03_01.00.00_30 succeeded.
2020-05-03 01:28:16 backup-fd JobId 167991: Dump vmdk
6000C294-0861-4dda-579d-9c3578e287d2 succeeded.
2020-05-03 01:28:16 backup-fd JobId 167991: VDDK Transport "nbd" selected
2020-05-03 01:28:35 backup-fd JobId 167991: Dump vmdk
6000C29d-9771-8fc3-df55-61031caec345 succeeded.
2020-05-03 01:28:35 backup-fd JobId 167991: VDDK Transport "nbd" selected 2020-05-03 01:28:40 backup-dir JobId 167991: Bacula Enterprise backup-dir
12.0.1 (02Jul19):
Build OS: x86_64-redhat-linux-gnu-bacula-enterprise redhat
Enterprise release
  JobId:                  167991
  Job:                    xxx.2020-05-03_01.00.00_30
  Backup Level:           Full
  Client:                 "xxx" 12.0.1 (02Jul19)
x86_64-redhat-linux-gnu-bacula-enterprise,redhat,Enterprise release
  FileSet:                "xxx" 2018-11-17 01:00:00
  Pool:                   "xxx" (From Job resource)
  Catalog:                "xxx" (From Client resource)
  Storage:                "backup-sd" (From Pool resource)
  Scheduled time:         03-May-2020 01:00:00
  Start time:             03-May-2020 01:14:35
  End time:               03-May-2020 01:28:40
  Elapsed time:           14 mins 5 secs
  Priority:               10
  FD Files Written:       8
  SD Files Written:       8
  FD Bytes Written:       17,544,329,232 (17.54 GB)
  SD Bytes Written:       17,544,331,241 (17.54 GB)
  Rate:                   20762.5 KB/s
  Software Compression:   42.6% 1.7:1
  Comm Line Compression:  None
  Snapshot/VSS:           no
  Encryption:             no
  Accurate:               yes
  Volume name(s):         vol2292
  Volume Session Id:      507
  Volume Session Time:    1587987580
  Last Volume Bytes:      17,559,722,646 (17.55 GB)
  Non-fatal FD errors:    0
  SD Errors:              0
  FD termination status:  OK
  SD termination status:  OK
  Termination:            Backup OK

2020-05-03 01:28:40 backup-sd JobId 167991: Sending spooled attrs to the
Director. Despooling 2,350 bytes ...
2020-05-03 01:28:40 backup-sd JobId 167991: Elapsed time=00:14:05, Transfer
rate=20.76 M Bytes/second
2020-05-03 01:28:40 backup-fd JobId 167991: BACKUP OK xxx (vm-240)
2020-05-03 01:28:40 backup-fd JobId 167991: Delete snapshot
xxx.2020-05-03_01.00.00_30 succeeded.
2020-10-01 09:35:21 backup-dir JobId 177827: Using Device
"Cloud-xxx-Drive-01" to write.
2020-10-01 09:35:22 backup-sd JobId 177827: Volume "cloud178569" previously
written, moving to end of data.
2020-10-01 09:35:22 backup-sd JobId 177827: Using S3 cloud driver Host=xxx
Bucket=bacula
2020-10-01 09:38:01 backup-sd JobId 177827: Elapsed time=00:02:39, Transfer
rate=110.3 M Bytes/second
2020-10-01 09:38:01 backup-sd JobId 177827: Cloud Upload transfers:
2020-10-01 09:38:04 backup-sd JobId 177827: cloud178569/part.6
state=done    size=1.073 GB duration=71s
2020-10-01 09:39:55 backup-sd JobId 177827: cloud178569/part.7
state=done    size=1.073 GB duration=72s
2020-10-01 09:41:18 backup-sd JobId 177827: cloud178569/part.8
state=done    size=1.073 GB duration=72s
2020-10-01 09:42:46 backup-sd JobId 177827: cloud178569/part.9
state=done    size=1.073 GB duration=71s
2020-10-01 09:43:57 backup-sd JobId 177827: cloud178569/part.10
 state=done    size=1.073 GB duration=71s
2020-10-01 09:46:06 backup-sd JobId 177827: cloud178569/part.11
 state=done    size=1.073 GB duration=72s
2020-10-01 09:47:33 backup-sd JobId 177827: cloud178569/part.12
 state=done    size=1.073 GB duration=72s
2020-10-01 09:48:44 backup-sd JobId 177827: cloud178569/part.13
 state=done    size=1.073 GB duration=71s
2020-10-01 09:50:45 backup-sd JobId 177827: cloud178569/part.14
 state=done    size=1.073 GB duration=71s
2020-10-01 09:52:23 backup-sd JobId 177827: cloud178569/part.15
 state=done    size=1.073 GB duration=71s
2020-10-01 09:54:01 backup-sd JobId 177827: cloud178569/part.16
 state=done    size=1.073 GB duration=72s
2020-10-01 09:55:13 backup-sd JobId 177827: cloud178569/part.17
 state=done    size=1.073 GB duration=71s
2020-10-01 09:56:24 backup-sd JobId 177827: cloud178569/part.18
 state=done    size=1.073 GB duration=71s
2020-10-01 09:58:25 backup-sd JobId 177827: cloud178569/part.19
 state=done    size=1.073 GB duration=71s
2020-10-01 10:00:16 backup-sd JobId 177827: cloud178569/part.20
 state=done    size=1.073 GB duration=72s
2020-10-01 10:01:27 backup-sd JobId 177827: cloud178569/part.21
 state=done    size=1.073 GB duration=62s
2020-10-01 10:01:53 backup-sd JobId 177827: cloud178569/part.22
 state=done    size=379.9 MB duration=25s
2020-10-01 10:01:53 backup-sd JobId 177827: Sending spooled attrs to the
Director. Despooling 2,350 bytes ...

It is somehow combined from the two logs for the following jobids: 167991 -
which is an original backup job and 177827 - which is a copy job. I'm
unable to explain how the original jobid 167991 log was included, other
than it was just copied from the log table by Director and not SD. Any
thoughts here?

There few differences which could be important:
1. You are using Bacula Enterprise and I am using Community version
2. You are using S3 module and it might even create additional job,
   I don't know because I newer used Bacula's S3 module.

As for how they got combined, it's the same as with other combined
messages. In this case Storage daemon sent to the Director daemon
which knows about it and knew those JobIDs are related.

Storage daemon does not make a connection to the database, only
Bacula Dir can do that.


And this is a reason SD cannot copy the original log.

Storage daemon does not copy or retrieve the logs from anywhere.
It produces log lines and can send them to Director daemon if
configured that way which is the default.


Additionally, those logs in the Catalog are getting stored in the
Catalog only in case one is using Postgres. In case of MySQL the log
table is empty.
I am currently using Postgres but when I used MySQL Bacula behaved
exactly the same (in regard to the issue we are discussing here)
except the log table was empty.


I do not use MySQL for Bacula, so I cannot verify it.

I have checked one Bacula installation which is using MySQL and I have
to withdraw my statement about MySQL and empty Log table.

It has nothing to with the fact that Catalog is using MySQL, it was
simply the case with the Director configuration where Messages
resource didn't use the option "catalog = all".

In any case, Log table is not used for combining the messages from
different Bacula daemons as it can be completely empty and one would
still observe everything I was talking about in my previous posts.


Here is the example of one such additional e-mail message:
>>
>> 30-Sep 06:18 bkp1-dir JobId 5132: Using Device "tape1-dev1" to write.
>> 30-Sep 06:18 bkp1-sd JobId 5132: Spooling data ...
>> 30-Sep 06:21 bkp1-sd JobId 5132: Committing spooled data to Volume
>> "tapevol-0011". Despooling 901,424,162 bytes ...
>> 30-Sep 06:21 bkp1-sd JobId 5132: Despooling elapsed time = 00:00:05,
>> Transfer rate = 180.2 M Bytes/second
>> 30-Sep 06:21 bkp1-sd JobId 5132: Elapsed time=00:02:16, Transfer
>> rate=6.621 M Bytes/second
>> 30-Sep 06:21 bkp1-sd JobId 5132: Sending spooled attrs to the
>> Director.
>> Despooling 580 bytes ...
>>
>
> Is it an original job message or a copy job message? What job ids are
> for
> an original job and its clone? Could you check, please. You have to
> distinguish between an original backup job message and message
> generated during copy.

The original (normal Full Backup) job ID was: 4997


I'm talking about logs from this job.

But I am not talking about the original, normal Backup job.
That job which later gets copied by the Copy job has nothing with
my post and its logs and report messages are fine.

The problem is that when you run a Copy job which has to copy that
normal Backup job, you get two e-mail messages instead of one.
None of these two messages have nothing to do with the messages
and logs produced while the original, normal Backup was running.

I hope it's now more clear what I am saying.

The ID of the first Copy job which started the copy was 4999
The ID that actually copied JobID 4997 from disk to tape was 5132


which are combined with the above logs and saved as target logs.

That's what I am saying, the logs related to 4999 and 5132 are not
getting combined. Instead, you get two messages and the one related to
the Job 5132 has the same Subject in the e-mail as the message sent
for the normal, original Backup job.

That Job 5132 was the one whose messages are getting sent in the
separate e-mail message and I believe it would make more sense if
its messages would be combined with the messages that belong to the
job with the JobID 4999 which started it.


Why? I do not understand. How many copy jobs do you copy on a single copy
job run?

It depends on the Selection Type directive but in my case it is
currently 67.

But if one needs to specify one Copy job per normal Backup job,
Bacula would send two e-mails per Copy job every time a Copy job
is run.


They ended in the approximately the same time: 06:21
   Prev Backup JobId:      4997


This job was running a way ahead of the below's copy jobs, right?

Yes, the Job 4997 is the original, normal Backup job and it could
have been created days ago before someone run the Copy job if I wanted
to configure it like that.

   New Backup JobId:       5132
   Current JobId:          4999
which is ok.

That job was the last one so I have checked another one just to check
that everything looks as it should because the Copy job with the ID
of 4997 is the first Copy job which also processes he Selection Type
directive from the Job configuration.

So, the IDs of another job:
The original (normal Full Backup) job ID was: 4996
The ID of the first Copy job which started the copy was 5130
The ID that actually copied JobID 4997 from disk to tape was 5131

They ended in the approximatelly the same time: 06:18
In the message it says:
   Prev Backup JobId:      4996
   New Backup JobId:       5131
   Current JobId:          5130
which all correct and expected.


There is a need for a deep redesign of the main Bacula components and if
you have an idea how it should work just discuss it on the devel list.

I don't think it's that complicated, it's just a matter of presenting
it properly and understanding the issue.

I am observing this problem for years and if I get some time next
year I will either open this issue or try to make a patch myself.

I have also found a reproducible bug in Bacula which makes the Director
segfault but it's not that critical.

Thank you for your time.

Kind Regards!

--
Josip Deanovic


_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to