Hi all, I've been using the S3 driver for some time now, and working around various idiosyncrasies - the most lasting one being where occasionally due to something being shut down there are backup parts in the cache that haven't been uploaded for some reason.
I started using a job that ran at boot to run cloud upload storage=$STORAGE pool=$POOL allfrompool after an interval just as a catchall (POOL and STORAGE are set to the Cloud pool and Storage respectively. A couple of months ago (approximately) I realized that the Bacula Director was inoperative - hung - and I tracked this down to the Storage Director hanging when this command is executed. Restarting the SD clears the problem (though it also kills the upload). Otherwise, the Cloud-based storage continues to work fine. I don't see any error messages anywhere, and activity in the Director and File Daemon seem to return to normal when the Storage Daemon is killed/restarted. I did run the SD with debug level 100, but I suspect it's the absence of something I'm looking for. Here's the debug output (somewhat anonymized): Mar 05 16:10:46 four bacula-sd[2263159]: List plugins. Hook count=0 Mar 05 16:10:46 four bacula-sd[2263159]: four-sd: bsock.c:861-0 socket=4 who=client host=[2001:0DB8::4] port=9103 Mar 05 16:10:46 four bacula-sd[2263159]: four-sd: bnet_server.c:235-0 Accept socket=2001:0DB8::4.9103:2001:0DB8::4.37168 s=0x5636929cedf8 Mar 05 16:10:46 four bacula-sd[2263159]: four-sd: dircmd.c:196-0 Got a DIR connection at 05-Mar-2024 16:10:46 Mar 05 16:10:46 four bacula-sd[2263159]: four-sd: authenticatebase.cc:365-0 TLSPSK Remote need 100 Mar 05 16:10:46 four bacula-sd[2263159]: four-sd: authenticatebase.cc:335-0 TLSPSK Local need 100 Mar 05 16:10:46 four bacula-sd[2263159]: four-sd: authenticatebase.cc:563-0 TLSPSK Start PSK Mar 05 16:10:46 four bacula-sd[2263159]: four-sd: bnet.c:96-0 TLS server negotiation established. Mar 05 16:10:46 four bacula-sd[2263159]: four-sd: cram-md5.c:68-0 send: auth cram-md5 challenge <104636656.1709673046@four-sd> ssl=0 Mar 05 16:10:46 four bacula-sd[2263159]: four-sd: cram-md5.c:132-0 cram-get received: auth cram-md5 <111733638.1709673046@four-dir> ssl=0 Mar 05 16:10:46 four bacula-sd[2263159]: four-sd: cram-md5.c:156-0 sending resp to challenge: CR+Psy+qh9NHh1+BA++VSA Mar 05 16:10:46 four bacula-sd[2263159]: four-sd: dircmd.c:227-0 Message channel init completed. Mar 05 16:10:46 four bacula-sd[2263159]: four-sd: status.c:1153-0 cmd=devices Mar 05 16:10:46 four bacula-sd[2263159]: four-sd: bsock.c:861-0 socket=4 who=client host=[2001:0DB8::4] port=9103 Mar 05 16:10:46 four bacula-sd[2263159]: four-sd: bnet_server.c:235-0 Accept socket=2001:0DB8::4.9103:2001:0DB8::4.37182 s=0x5636929f14b8 Mar 05 16:10:46 four bacula-sd[2263159]: four-sd: dircmd.c:196-0 Got a DIR connection at 05-Mar-2024 16:10:46 Mar 05 16:10:46 four bacula-sd[2263159]: four-sd: authenticatebase.cc:365-0 TLSPSK Remote need 100 Mar 05 16:10:46 four bacula-sd[2263159]: four-sd: authenticatebase.cc:335-0 TLSPSK Local need 100 Mar 05 16:10:46 four bacula-sd[2263159]: four-sd: authenticatebase.cc:563-0 TLSPSK Start PSK Mar 05 16:10:46 four bacula-sd[2263159]: four-sd: bnet.c:96-0 TLS server negotiation established. Mar 05 16:10:46 four bacula-sd[2263159]: four-sd: cram-md5.c:68-0 send: auth cram-md5 challenge <934142435.1709673046@four-sd> ssl=0 Mar 05 16:10:46 four bacula-sd[2263159]: four-sd: cram-md5.c:132-0 cram-get received: auth cram-md5 <2101906992.1709673046@four-dir> ssl=0 Mar 05 16:10:46 four bacula-sd[2263159]: four-sd: cram-md5.c:156-0 sending resp to challenge: b6/yPWVPOzU/cm/0C9/+/B Mar 05 16:10:46 four bacula-sd[2263159]: four-sd: dircmd.c:227-0 Message channel init completed. Mar 05 16:10:47 four bacula-sd[2263159]: List plugins. Hook count=0 Mar 05 16:10:47 four bacula-sd[2263159]: four-sd: bsock.c:861-0 socket=4 who=client host=[2001:0DB8::4] port=9103 Mar 05 16:10:47 four bacula-sd[2263159]: four-sd: bnet_server.c:235-0 Accept socket=2001:0DB8::4.9103:2001:0DB8::4.37186 s=0x563692a08dd8 Mar 05 16:10:47 four bacula-sd[2263159]: four-sd: dircmd.c:196-0 Got a DIR connection at 05-Mar-2024 16:10:47 Mar 05 16:10:47 four bacula-sd[2263159]: four-sd: authenticatebase.cc:365-0 TLSPSK Remote need 100 Mar 05 16:10:47 four bacula-sd[2263159]: four-sd: authenticatebase.cc:335-0 TLSPSK Local need 100 Mar 05 16:10:47 four bacula-sd[2263159]: four-sd: authenticatebase.cc:563-0 TLSPSK Start PSK Mar 05 16:10:47 four bacula-sd[2263159]: four-sd: bnet.c:96-0 TLS server negotiation established. Mar 05 16:10:47 four bacula-sd[2263159]: four-sd: cram-md5.c:68-0 send: auth cram-md5 challenge <1669111798.1709673047@four-sd> ssl=0 Mar 05 16:10:47 four bacula-sd[2263159]: four-sd: cram-md5.c:132-0 cram-get received: auth cram-md5 <1290027090.1709673047@four-dir> ssl=0 Mar 05 16:10:47 four bacula-sd[2263159]: four-sd: cram-md5.c:156-0 sending resp to challenge: F/oa1Ssdqw1iEwcd6j/CKA Mar 05 16:10:47 four bacula-sd[2263159]: four-sd: dircmd.c:227-0 Message channel init completed. Mar 05 16:10:47 four bacula-sd[2263159]: four-sd: dircmd.c:1210-0 Found device AWS_S3_Cloud1 Mar 05 16:10:47 four bacula-sd[2263159]: four-sd: dircmd.c:1254-0 Found device AWS_S3_Cloud1 Mar 05 16:10:47 four bacula-sd[2263159]: four-sd: acquire.c:671-0 Attach 0x9401bbf8 to dev "AWS_S3_Cloud1" (/data/Backup/bstor_aws_cache) Mar 05 16:10:48 four bacula-sd[2263159]: four-sd: bsock.c:861-0 socket=6 who=client host=[2001:0DB8::4] port=9103 Mar 05 16:10:48 four bacula-sd[2263159]: four-sd: bnet_server.c:235-0 Accept socket=2001:0DB8::4.9103:2001:0DB8::4.37198 s=0x5636929cedf8 Mar 05 16:10:48 four bacula-sd[2263159]: four-sd: dircmd.c:196-0 Got a DIR connection at 05-Mar-2024 16:10:48 Mar 05 16:10:48 four bacula-sd[2263159]: four-sd: authenticatebase.cc:365-0 TLSPSK Remote need 100 Mar 05 16:10:48 four bacula-sd[2263159]: four-sd: authenticatebase.cc:335-0 TLSPSK Local need 100 Mar 05 16:10:48 four bacula-sd[2263159]: four-sd: authenticatebase.cc:563-0 TLSPSK Start PSK Mar 05 16:10:48 four bacula-sd[2263159]: four-sd: bnet.c:96-0 TLS server negotiation established. Mar 05 16:10:48 four bacula-sd[2263159]: four-sd: cram-md5.c:68-0 send: auth cram-md5 challenge <76485448.1709673048@four-sd> ssl=0 Mar 05 16:10:48 four bacula-sd[2263159]: four-sd: cram-md5.c:132-0 cram-get received: auth cram-md5 <76485448.1709673048@four-dir> ssl=0 Mar 05 16:10:48 four bacula-sd[2263159]: four-sd: cram-md5.c:156-0 sending resp to challenge: HC+BF8+G9mVUIlAV99+5LC Mar 05 16:10:48 four bacula-sd[2263159]: four-sd: dircmd.c:227-0 Message channel init completed. Mar 05 16:14:40 four bacula-sd[2263159]: four-sd: bsock.c:861-0 socket=7 who=client host=[2001:0DB8::4] port=9103 Mar 05 16:14:40 four bacula-sd[2263159]: four-sd: bnet_server.c:235-0 Accept socket=2001:0DB8::4.9103:2001:0DB8::4.55784 s=0x5636929f14b8 Mar 05 16:14:40 four bacula-sd[2263159]: four-sd: dircmd.c:196-0 Got a DIR connection at 05-Mar-2024 16:14:40 Mar 05 16:14:40 four bacula-sd[2263159]: four-sd: authenticatebase.cc:365-0 TLSPSK Remote need 100 Mar 05 16:14:40 four bacula-sd[2263159]: four-sd: authenticatebase.cc:335-0 TLSPSK Local need 100 Mar 05 16:14:40 four bacula-sd[2263159]: four-sd: authenticatebase.cc:563-0 TLSPSK Start PSK Mar 05 16:14:40 four bacula-sd[2263159]: four-sd: bnet.c:96-0 TLS server negotiation established. Mar 05 16:14:40 four bacula-sd[2263159]: four-sd: cram-md5.c:68-0 send: auth cram-md5 challenge <1104246352.1709673280@four-sd> ssl=0 Mar 05 16:14:40 four bacula-sd[2263159]: four-sd: cram-md5.c:132-0 cram-get received: auth cram-md5 <78526969.1709673280@four-dir> ssl=0 Mar 05 16:14:40 four bacula-sd[2263159]: four-sd: cram-md5.c:156-0 sending resp to challenge: 49UP5RldJ5ZVEg/gS++skD Mar 05 16:14:40 four bacula-sd[2263159]: four-sd: dircmd.c:227-0 Message channel init completed. Mar 05 16:32:16 four systemd[1]: bacula-sd.service: Main process exited, code=killed, status=9/KILL Mar 05 16:32:16 four systemd[1]: bacula-sd.service: Failed with result 'signal'. Mar 05 16:32:51 four systemd[1]: Started Bacula Storage Daemon service. In the case above, I killed the daemon after 20+ minutes, but it went for days a couple of times before I realized that it was toasted. No output from the upload command. The SD does connect to AWS, but times out. Also, the bconsole process can't be killed by ^C. I'm not sure this is going to help anyone with debugging - I'm posting to ask whether anyone has any ideas on either what might be going wrong here, or where I could develop more information before trying to file a bug. BTW, this is RHEL 8, AMD, and Bacula 13.0.2, 18Feb23. Thanks for any ideas!
_______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users