Hi everyone,

I'm currently chasing down an issue with Bareos that is causing
intermittent backup failures during busy periods. This is happening on our
production Bareos install which is running version 16.2.7.

Each night our daily backup schedule starts at 18:30 and runs on about 115
of our hosts.

We have MaximumConcurrentJobs set to 40 in both the director (director
resource and storage daemon resource) as well as the storage daemon
(storage daemon resource) configurations. The storage daemon is using
file-based storage, with 40 devices, each one with a MaximumConcurrentJobs
value of 1. No tapes are involved.

At around 18:35, some jobs start failing due to a storage daemon
authorization error - I'll include an example at the end of this email.
Roughly 5-10% of our jobs are failing, and this issue was also masked by a
secondary problem where the job status was recorded as "T" (terminated
successfully) in the mysql database - that's an issue for another post
though.

Does anyone have any suggestions or recommendations for diagnosing or
fixing this issue? is 40 concurrent jobs absurdly high? our nightly jobs
finish within a few hours, so I am tempted to lower this value, but I'm
also concerned that the jobs are being rejected, rather than delayed.

I appreciate any comments or feedback. Please let me know if I can provide
more configuration details or context.

Thanks and regards, Anthony

Example of failed job (bareos.log excerpt):

01-Feb 18:35 bareoshost JobId 206798: Start Backup JobId 206798,
Job=elasticsearch.blog:clienthost.2020-02-01_18.30.25_32
01-Feb 18:35 bareoshost JobId 206798: Fatal error: Authorization key
rejected by Storage daemon File1.
Please see
http://doc.bareos.org/master/html/bareos-manual-main-reference.html#AuthorizationErrors
for help.
01-Feb 18:35 bareoshost JobId 206798: Fatal error: Director unable to
authenticate with Storage daemon at "bareoshost:9103". Possible causes:
Passwords or names not the same or
TLS negotiation problem or
Maximum Concurrent Jobs exceeded on the SD or
SD networking messed up (restart daemon).
Please see
http://doc.bareos.org/master/html/bareos-manual-main-reference.html#AuthorizationErrors
for help.
01-Feb 18:35 bareoshost JobId 206798: Error: Bareos bareoshost 16.2.7
(09Oct17):
  Build OS:               x86_64-redhat-linux-gnu redhat CentOS Linux
release 7.4.1708 (Core)
  JobId:                  206798
  Job:
 elasticsearch.blog:clienthost.2020-02-01_18.30.25_32
  Backup Level:           Full
  Client:                 "bareoshost" 16.2.7 (09Oct17)
x86_64-redhat-linux-gnu,redhat,CentOS Linux release 7.4.1708 (Core)
  FileSet:                "clienthost:elasticsearch.blog" 2018-08-08
18:30:16
  Pool:                   "daily" (From Run Pool override)
  Catalog:                "MyCatalog" (From Client resource)
  Storage:                "File1" (From Pool resource)
  Scheduled time:         01-Feb-2020 18:30:25
  Start time:             01-Feb-2020 18:35:38
  End time:               01-Feb-2020 18:35:43
  Elapsed time:           5 secs
  Priority:               10
  FD Files Written:       0
  SD Files Written:       0
  FD Bytes Written:       0 (0 B)
  SD Bytes Written:       0 (0 B)
  Rate:                   0.0 KB/s
  Software Compression:   None
  VSS:                    no
  Encryption:             no
  Accurate:               yes
  Volume name(s):
  Volume Session Id:      0
  Volume Session Time:    0
  Last Volume Bytes:      0 (0 B)
  Non-fatal FD errors:    1
  SD Errors:              0
  FD termination status:
  SD termination status:
  FD  Secure Erase Cmd:   <NULL>
  SD  Secure Erase Cmd:   <NULL>
  Termination:            *** Backup Error ***

-- 
You received this message because you are subscribed to the Google Groups 
"bareos-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/bareos-users/CAB_keXN8oLneUaGquOSgb%3D4kPagJKAhc5xg8a37kZP0%2B4SwkTA%40mail.gmail.com.

Reply via email to