Hi all,

I've been using the S3 driver for some time now, and working around various
idiosyncrasies - the most lasting one being where occasionally due to
something being shut down there are backup parts in the cache that haven't
been uploaded for some reason.

I started using a job that ran at boot to run cloud upload storage=$STORAGE
pool=$POOL allfrompool after an interval just as a catchall (POOL and
STORAGE are set to the Cloud pool and Storage respectively.  A couple of
months ago (approximately) I realized that the Bacula Director was
inoperative - hung - and I tracked this down to the Storage Director
hanging when this command is executed.  Restarting the SD clears the
problem (though it also kills the upload).  Otherwise, the Cloud-based
storage continues to work fine.

I don't see any error messages anywhere, and activity in the Director and
File Daemon seem to return to normal when the Storage Daemon is
killed/restarted.  I did run the SD with debug level 100, but I suspect
it's the absence of something I'm looking for.  Here's the debug output
(somewhat anonymized):

Mar 05 16:10:46 four bacula-sd[2263159]: List plugins. Hook count=0
Mar 05 16:10:46 four bacula-sd[2263159]: four-sd: bsock.c:861-0 socket=4
who=client host=[2001:0DB8::4] port=9103
Mar 05 16:10:46 four bacula-sd[2263159]: four-sd: bnet_server.c:235-0
Accept socket=2001:0DB8::4.9103:2001:0DB8::4.37168 s=0x5636929cedf8
Mar 05 16:10:46 four bacula-sd[2263159]: four-sd: dircmd.c:196-0 Got a DIR
connection at 05-Mar-2024 16:10:46
Mar 05 16:10:46 four bacula-sd[2263159]: four-sd: authenticatebase.cc:365-0
TLSPSK Remote need 100
Mar 05 16:10:46 four bacula-sd[2263159]: four-sd: authenticatebase.cc:335-0
TLSPSK Local need 100
Mar 05 16:10:46 four bacula-sd[2263159]: four-sd: authenticatebase.cc:563-0
TLSPSK Start PSK
Mar 05 16:10:46 four bacula-sd[2263159]: four-sd: bnet.c:96-0 TLS server
negotiation established.
Mar 05 16:10:46 four bacula-sd[2263159]: four-sd: cram-md5.c:68-0 send:
auth cram-md5 challenge <104636656.1709673046@four-sd> ssl=0
Mar 05 16:10:46 four bacula-sd[2263159]: four-sd: cram-md5.c:132-0 cram-get
received: auth cram-md5 <111733638.1709673046@four-dir> ssl=0
Mar 05 16:10:46 four bacula-sd[2263159]: four-sd: cram-md5.c:156-0 sending
resp to challenge: CR+Psy+qh9NHh1+BA++VSA
Mar 05 16:10:46 four bacula-sd[2263159]: four-sd: dircmd.c:227-0 Message
channel init completed.
Mar 05 16:10:46 four bacula-sd[2263159]: four-sd: status.c:1153-0
cmd=devices
Mar 05 16:10:46 four bacula-sd[2263159]: four-sd: bsock.c:861-0 socket=4
who=client host=[2001:0DB8::4] port=9103
Mar 05 16:10:46 four bacula-sd[2263159]: four-sd: bnet_server.c:235-0
Accept socket=2001:0DB8::4.9103:2001:0DB8::4.37182 s=0x5636929f14b8
Mar 05 16:10:46 four bacula-sd[2263159]: four-sd: dircmd.c:196-0 Got a DIR
connection at 05-Mar-2024 16:10:46
Mar 05 16:10:46 four bacula-sd[2263159]: four-sd: authenticatebase.cc:365-0
TLSPSK Remote need 100
Mar 05 16:10:46 four bacula-sd[2263159]: four-sd: authenticatebase.cc:335-0
TLSPSK Local need 100
Mar 05 16:10:46 four bacula-sd[2263159]: four-sd: authenticatebase.cc:563-0
TLSPSK Start PSK
Mar 05 16:10:46 four bacula-sd[2263159]: four-sd: bnet.c:96-0 TLS server
negotiation established.
Mar 05 16:10:46 four bacula-sd[2263159]: four-sd: cram-md5.c:68-0 send:
auth cram-md5 challenge <934142435.1709673046@four-sd> ssl=0
Mar 05 16:10:46 four bacula-sd[2263159]: four-sd: cram-md5.c:132-0 cram-get
received: auth cram-md5 <2101906992.1709673046@four-dir> ssl=0
Mar 05 16:10:46 four bacula-sd[2263159]: four-sd: cram-md5.c:156-0 sending
resp to challenge: b6/yPWVPOzU/cm/0C9/+/B
Mar 05 16:10:46 four bacula-sd[2263159]: four-sd: dircmd.c:227-0 Message
channel init completed.
Mar 05 16:10:47 four bacula-sd[2263159]: List plugins. Hook count=0
Mar 05 16:10:47 four bacula-sd[2263159]: four-sd: bsock.c:861-0 socket=4
who=client host=[2001:0DB8::4] port=9103
Mar 05 16:10:47 four bacula-sd[2263159]: four-sd: bnet_server.c:235-0
Accept socket=2001:0DB8::4.9103:2001:0DB8::4.37186 s=0x563692a08dd8
Mar 05 16:10:47 four bacula-sd[2263159]: four-sd: dircmd.c:196-0 Got a DIR
connection at 05-Mar-2024 16:10:47
Mar 05 16:10:47 four bacula-sd[2263159]: four-sd: authenticatebase.cc:365-0
TLSPSK Remote need 100
Mar 05 16:10:47 four bacula-sd[2263159]: four-sd: authenticatebase.cc:335-0
TLSPSK Local need 100
Mar 05 16:10:47 four bacula-sd[2263159]: four-sd: authenticatebase.cc:563-0
TLSPSK Start PSK
Mar 05 16:10:47 four bacula-sd[2263159]: four-sd: bnet.c:96-0 TLS server
negotiation established.
Mar 05 16:10:47 four bacula-sd[2263159]: four-sd: cram-md5.c:68-0 send:
auth cram-md5 challenge <1669111798.1709673047@four-sd> ssl=0
Mar 05 16:10:47 four bacula-sd[2263159]: four-sd: cram-md5.c:132-0 cram-get
received: auth cram-md5 <1290027090.1709673047@four-dir> ssl=0
Mar 05 16:10:47 four bacula-sd[2263159]: four-sd: cram-md5.c:156-0 sending
resp to challenge: F/oa1Ssdqw1iEwcd6j/CKA
Mar 05 16:10:47 four bacula-sd[2263159]: four-sd: dircmd.c:227-0 Message
channel init completed.
Mar 05 16:10:47 four bacula-sd[2263159]: four-sd: dircmd.c:1210-0 Found
device AWS_S3_Cloud1
Mar 05 16:10:47 four bacula-sd[2263159]: four-sd: dircmd.c:1254-0 Found
device AWS_S3_Cloud1
Mar 05 16:10:47 four bacula-sd[2263159]: four-sd: acquire.c:671-0 Attach
0x9401bbf8 to dev "AWS_S3_Cloud1" (/data/Backup/bstor_aws_cache)
Mar 05 16:10:48 four bacula-sd[2263159]: four-sd: bsock.c:861-0 socket=6
who=client host=[2001:0DB8::4] port=9103
Mar 05 16:10:48 four bacula-sd[2263159]: four-sd: bnet_server.c:235-0
Accept socket=2001:0DB8::4.9103:2001:0DB8::4.37198 s=0x5636929cedf8
Mar 05 16:10:48 four bacula-sd[2263159]: four-sd: dircmd.c:196-0 Got a DIR
connection at 05-Mar-2024 16:10:48
Mar 05 16:10:48 four bacula-sd[2263159]: four-sd: authenticatebase.cc:365-0
TLSPSK Remote need 100
Mar 05 16:10:48 four bacula-sd[2263159]: four-sd: authenticatebase.cc:335-0
TLSPSK Local need 100
Mar 05 16:10:48 four bacula-sd[2263159]: four-sd: authenticatebase.cc:563-0
TLSPSK Start PSK
Mar 05 16:10:48 four bacula-sd[2263159]: four-sd: bnet.c:96-0 TLS server
negotiation established.
Mar 05 16:10:48 four bacula-sd[2263159]: four-sd: cram-md5.c:68-0 send:
auth cram-md5 challenge <76485448.1709673048@four-sd> ssl=0
Mar 05 16:10:48 four bacula-sd[2263159]: four-sd: cram-md5.c:132-0 cram-get
received: auth cram-md5 <76485448.1709673048@four-dir> ssl=0
Mar 05 16:10:48 four bacula-sd[2263159]: four-sd: cram-md5.c:156-0 sending
resp to challenge: HC+BF8+G9mVUIlAV99+5LC
Mar 05 16:10:48 four bacula-sd[2263159]: four-sd: dircmd.c:227-0 Message
channel init completed.
Mar 05 16:14:40 four bacula-sd[2263159]: four-sd: bsock.c:861-0 socket=7
who=client host=[2001:0DB8::4] port=9103
Mar 05 16:14:40 four bacula-sd[2263159]: four-sd: bnet_server.c:235-0
Accept socket=2001:0DB8::4.9103:2001:0DB8::4.55784 s=0x5636929f14b8
Mar 05 16:14:40 four bacula-sd[2263159]: four-sd: dircmd.c:196-0 Got a DIR
connection at 05-Mar-2024 16:14:40
Mar 05 16:14:40 four bacula-sd[2263159]: four-sd: authenticatebase.cc:365-0
TLSPSK Remote need 100
Mar 05 16:14:40 four bacula-sd[2263159]: four-sd: authenticatebase.cc:335-0
TLSPSK Local need 100
Mar 05 16:14:40 four bacula-sd[2263159]: four-sd: authenticatebase.cc:563-0
TLSPSK Start PSK
Mar 05 16:14:40 four bacula-sd[2263159]: four-sd: bnet.c:96-0 TLS server
negotiation established.
Mar 05 16:14:40 four bacula-sd[2263159]: four-sd: cram-md5.c:68-0 send:
auth cram-md5 challenge <1104246352.1709673280@four-sd> ssl=0
Mar 05 16:14:40 four bacula-sd[2263159]: four-sd: cram-md5.c:132-0 cram-get
received: auth cram-md5 <78526969.1709673280@four-dir> ssl=0
Mar 05 16:14:40 four bacula-sd[2263159]: four-sd: cram-md5.c:156-0 sending
resp to challenge: 49UP5RldJ5ZVEg/gS++skD
Mar 05 16:14:40 four bacula-sd[2263159]: four-sd: dircmd.c:227-0 Message
channel init completed.
Mar 05 16:32:16 four systemd[1]: bacula-sd.service: Main process exited,
code=killed, status=9/KILL
Mar 05 16:32:16 four systemd[1]: bacula-sd.service: Failed with result
'signal'.
Mar 05 16:32:51 four systemd[1]: Started Bacula Storage Daemon service.

In the case above, I killed the daemon after 20+ minutes, but it went for
days a couple of times before I realized that it was toasted.  No output
from the upload command. The SD does connect to AWS, but times out.  Also,
the bconsole process can't be killed by ^C.

I'm not sure this is going to help anyone with debugging - I'm posting to
ask whether anyone has any ideas on either what might be going wrong here,
or where I could develop more information before trying to file a bug.

BTW, this is RHEL 8, AMD, and Bacula 13.0.2, 18Feb23.  Thanks for any
ideas!
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to