Hi Anthony,

I am facing the same problem you described. Please, could you provide your 
experience changing the "Max Concurrent Jobs" on the Storage resource of 
SD? Did you find the problem and solution?

Thank you, best regards.

On Friday, February 14, 2020 at 1:08:57 AM UTC-3, Anthony Vaccaro wrote:
>
> Hi Andreas,
>
> Thanks for your response.
>
> I think you're correct, the director and the storage daemon have the same 
> limit, but the director is sending some status requests to the storage 
> daemon which goes over the limit and causes connection rejections by the 
> SD. I enabled debug logging on both the SD and director recently and saw 
> that there was no CRAM-MD5 challenge sent by the SD when the director 
> connected - just a timeout.
>
> I'll attempt to increase the connection limit for the SD only and will let 
> you know if that doesn't work.
>
> Regards, Anthony
>
> On Thu, Feb 6, 2020 at 7:15 PM Andreas Rogge <[email protected] 
> <javascript:>> wrote:
>
>> Hi,
>>
>> could you please double-check that you put "Maximum Concurrent Jobs" in
>> the following resources:
>>
>> On the director:
>> - Director
>> - Storage
>>
>> On the storage daemon:
>> - Storage
>> - Device
>>
>> Basically in the Storage resource on the SD this should be called
>> "maximum concurrent connections", because "status storage" will count to
>> the limit, too. It is usually best to make sure the value on the SD's
>> Storage resource leaves a little room (if you set it to 40 on the
>> director, try 50 on the SD).
>>
>> Best Regards,
>> Andreas
>>
>> Am 06.02.20 um 09:24 schrieb Anthony Vaccaro:
>> > Hi everyone,
>> > 
>> > I'm currently chasing down an issue with Bareos that is causing
>> > intermittent backup failures during busy periods. This is happening on
>> > our production Bareos install which is running version 16.2.7.
>> > 
>> > Each night our daily backup schedule starts at 18:30 and runs on about
>> > 115 of our hosts. 
>> > 
>> > We have MaximumConcurrentJobs set to 40 in both the director (director
>> > resource and storage daemon resource) as well as the storage daemon
>> > (storage daemon resource) configurations. The storage daemon is using
>> > file-based storage, with 40 devices, each one with a
>> > MaximumConcurrentJobs value of 1. No tapes are involved.
>> > 
>> > At around 18:35, some jobs start failing due to a storage daemon
>> > authorization error - I'll include an example at the end of this email.
>> > Roughly 5-10% of our jobs are failing, and this issue was also masked by
>> > a secondary problem where the job status was recorded as "T" (terminated
>> > successfully) in the mysql database - that's an issue for another post
>> > though.
>> > 
>> > Does anyone have any suggestions or recommendations for diagnosing or
>> > fixing this issue? is 40 concurrent jobs absurdly high? our nightly jobs
>> > finish within a few hours, so I am tempted to lower this value, but I'm
>> > also concerned that the jobs are being rejected, rather than delayed.
>> > 
>> > I appreciate any comments or feedback. Please let me know if I can
>> > provide more configuration details or context.
>> > 
>> > Thanks and regards, Anthony
>> > 
>> > Example of failed job (bareos.log excerpt):
>> > 
>> > 01-Feb 18:35 bareoshost JobId 206798: Start Backup JobId 206798,
>> > Job=elasticsearch.blog:clienthost.2020-02-01_18.30.25_32
>> > 01-Feb 18:35 bareoshost JobId 206798: Fatal error: Authorization key
>> > rejected by Storage daemon File1.
>> > Please see
>> > 
>> http://doc.bareos.org/master/html/bareos-manual-main-reference.html#AuthorizationErrors
>> > for help.
>> > 01-Feb 18:35 bareoshost JobId 206798: Fatal error: Director unable to
>> > authenticate with Storage daemon at "bareoshost:9103". Possible causes:
>> > Passwords or names not the same or
>> > TLS negotiation problem or
>> > Maximum Concurrent Jobs exceeded on the SD or
>> > SD networking messed up (restart daemon).
>> > Please see
>> > 
>> http://doc.bareos.org/master/html/bareos-manual-main-reference.html#AuthorizationErrors
>> > for help.
>> > 01-Feb 18:35 bareoshost JobId 206798: Error: Bareos bareoshost 16.2.7
>> > (09Oct17):
>> >   Build OS:               x86_64-redhat-linux-gnu redhat CentOS Linux
>> > release 7.4.1708 (Core)
>> >   JobId:                  206798
>> >   Job:                  
>> >  elasticsearch.blog:clienthost.2020-02-01_18.30.25_32
>> >   Backup Level:           Full
>> >   Client:                 "bareoshost" 16.2.7 (09Oct17)
>> > x86_64-redhat-linux-gnu,redhat,CentOS Linux release 7.4.1708 (Core)
>> >   FileSet:                "clienthost:elasticsearch.blog" 2018-08-08
>> > 18:30:16
>> >   Pool:                   "daily" (From Run Pool override)
>> >   Catalog:                "MyCatalog" (From Client resource)
>> >   Storage:                "File1" (From Pool resource)
>> >   Scheduled time:         01-Feb-2020 18:30:25
>> >   Start time:             01-Feb-2020 18:35:38
>> >   End time:               01-Feb-2020 18:35:43
>> >   Elapsed time:           5 secs
>> >   Priority:               10
>> >   FD Files Written:       0
>> >   SD Files Written:       0
>> >   FD Bytes Written:       0 (0 B)
>> >   SD Bytes Written:       0 (0 B)
>> >   Rate:                   0.0 KB/s
>> >   Software Compression:   None
>> >   VSS:                    no
>> >   Encryption:             no
>> >   Accurate:               yes
>> >   Volume name(s):        
>> >   Volume Session Id:      0
>> >   Volume Session Time:    0
>> >   Last Volume Bytes:      0 (0 B)
>> >   Non-fatal FD errors:    1
>> >   SD Errors:              0
>> >   FD termination status:  
>> >   SD termination status:  
>> >   FD  Secure Erase Cmd:   <NULL>
>> >   SD  Secure Erase Cmd:   <NULL>
>> >   Termination:            *** Backup Error ***
>> > 
>> -- 
>>   Andreas Rogge                             [email protected] 
>> <javascript:>
>>   Bareos GmbH & Co. KG                      Phone: +49 221-630693-86
>>   http://www.bareos.com
>>
>>   Sitz der Gesellschaft: Köln | Amtsgericht Köln: HRA 29646
>>   Komplementär: Bareos Verwaltungs-GmbH
>>   Geschäftsführer: S. Dühr, M. Außendorf, J. Steffens, Philipp Storz
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"bareos-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/bareos-users/b00863a9-f7ed-4693-89da-47d13c1c1887%40googlegroups.com.

Reply via email to