I too am seeing backups fail with network errors. I'm also using TLS 
Certificates for transport security. I do not see anything in the system 
logs on the client or on the bareos server. I do not have any backtrace 
files, the storage daemon has not crashed. However all of my backup jobs 
are failing with the same error:

23-Oct 15:33 bareos-dir JobId 150: Fatal error: Network error with FD 
during Backup: ERR=Connection timed out                                     
                                             
23-Oct 15:33 bareos-dir JobId 150: Fatal error: Director's comm line to SD 
dropped.                                                                   
                                         
23-Oct 15:33 bareos-dir JobId 150: Fatal error: No Job status returned from 
FD.                
23-Oct 15:33 bareos-dir JobId 150: Error: Bareos bareos-dir 
23.0.5~pre146.7e91df1c0 (11Oct24):   

On Wednesday, July 24, 2024 at 2:46:59 PM UTC-5 Stephan Duehr wrote:

> Hi Seth,
>
> did you notice any bareos-sd crashes?
> Check your syslog or use journalctl for messages containing bareos-sd,
> are there any *.bactrace files in /var/lib/bareos/ ?
>
> Note that the systemd units will trigger automatic restart of bareos-sd if 
> it crashes.
>
> If you noticed bareos-sd crashes, make sure to install gdb and debuginfo 
> packages to get
> proper traceback of the next crash, for details see
> https://docs.bareos.org/Appendix/Debugging.html
>
> Regards,
> Stephan
>
> On 7/24/24 16:11, Seth Galitzer wrote:
> > I spent considerable time yesterday moving my bareos dir host from 
> centos7 to debian 12. Ran my usual set of jobs last night and still got 5 
> jobs that failed with "Fatal error: filed/backup.cc:1616 Network send error 
> to SD. ERR=Broken pipe". This only started happening in the last 6 weeks, 
> since I stood up a new fd host. None of my other fd hosts are triggering 
> this error. When I manually re-run these failed jobs, they usually complete 
> fine, though yesterday, I tried to rerun one three times and 
> > it never finished successfully. Both hosts are running the latest 
> version available from official bareos repos: 23.0.4~pre113.6ea98eb40-106. 
> I need some additional troubleshooting and debugging help with this. Debug 
> logs aren't really showing anything useful.
> > 
> > Thanks.
> > Seth
> > 
> > On Wednesday, July 10, 2024 at 9:32:40 AM UTC-5 Seth Galitzer wrote:
> > 
> > I've been running my dir and sd on a centos7 (I know, it's old) host, 
> upgrading bareos regularly. It's been processing jobs just fine from fd 
> hosts running a variety of debian and ubuntu releases, as well as another 
> centos7 host. I recently moved jobs from the centos7 fd to a new one 
> running debian 12 (bookworm), also running the latest bareos release. Since 
> then, jobs have been randomly failing from that host only.
> > 
> > I would get job reports with messages like this:
> > 05-Jul 20:00 imperial-dir JobId 60064: Fatal error: Network error with 
> FD during Backup: ERR=Connection reset by peer 05-Jul 20:00 imperial-dir 
> JobId 60064: Fatal error: Director's comm line to SD dropped. 05-Jul 20:00 
> imperial-dir JobId 60064: Fatal error: No Job status returned from FD. 
> 05-Jul 20:00 imperial-dir JobId 60064: Insert of attributes batch table 
> with 323847 entries start 05-Jul 20:00 imperial-dir JobId 60064: Insert of 
> attributes batch table done 05-Jul 20:00 imperial-dir JobId
> > 60064: Error: Bareos imperial-dir 23.0.4~pre61.010c81fdc (03Jul24):
> > 
> > Essentially, it looks like the job would run to completion, but then 
> never send the final OK back to the director, eventually time out and then 
> trigger this error. When I first setup the new fd host, this was happening 
> for every job. After doing a bit of research, I added "Heartbeat Interval = 
> 60" to the client config on the dir. Since then, most of the jobs have been 
> completing, but 5 out of about 30 still fail. Upon re-running those jobs 
> manually, sometimes 1 still fails, but the rest succeed.
> > 
> > Now, my job reports have errors like this:
> > 10-Jul 03:51 files-fd JobId 60268: Fatal error: filed/dir_cmd.cc:2423 
> Comm error with SD. bad response to Append Data. ERR=Connection reset by 
> peer 10-Jul 03:51 imperial-dir JobId 60268: Fatal error: Director's comm 
> line to SD dropped. 10-Jul 03:51 imperial-dir JobId 60268: Error: Bareos 
> imperial-dir 23.0.4~pre64.caca3169f (05Jul24):
> > 
> > I turned on trace debugging for the dir, sd, and fd (remember I have dir 
> and sd running on the same host). I can send full traces if needed, but the 
> most prevalent error from all three traces is something like this:
> > lib/tls_openssl_private.cc:325-60268 SSL_get_error() returned error 
> value 2
> > Sometimes the error code returned is 5, but it's usually 2.
> > 
> > I've been running bareos for several years without any problems and this 
> is the first major one I've hit. I would love to know what changed and if 
> there's anything that can be done to compensate for it. All my other fd 
> hosts are running jobs just fine. I don't believe most of the rest of them 
> are running bareos 23.0.3 releases. My next step is going to be to migrate 
> my dir/sd host to debain 12, hoping that comparable ssl libs will help. But 
> if there's anything else that can be done for a
> > quicker fix, I'd appreciate some advice.
> > 
> > Thanks.
> > Seth
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups "bareos-users" group.
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to [email protected] <mailto:
> [email protected]>.
> > To view this discussion on the web visit 
> https://groups.google.com/d/msgid/bareos-users/06ece7e7-37fe-4d5e-8669-3d6ecf51f306n%40googlegroups.com
>  
> <
> https://groups.google.com/d/msgid/bareos-users/06ece7e7-37fe-4d5e-8669-3d6ecf51f306n%40googlegroups.com?utm_medium=email&utm_source=footer
> >.
>
> -- 
> Stephan Dühr [email protected]
> Bareos GmbH & Co. KG Phone: +49 221-630693-90 <+49%20221%2063069390>
> http://www.bareos.com
>
> Sitz der Gesellschaft: Köln | Amtsgericht Köln: HRA 29646
> Komplementär: Bareos Verwaltungs-GmbH
> Geschäftsführer: S. Dühr, J. Steffens, Philipp Storz
>

-- 
You received this message because you are subscribed to the Google Groups 
"bareos-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/bareos-users/0a0be690-a398-43e1-89ea-cd73bc5280cbn%40googlegroups.com.

Reply via email to