Hello Phil,The SQL error looks like a real SQL error to me. I have seen
something similar to this with MariaDB.The other tracebacks don't look like
they are blocked. For the SQL problems, I have twosuggestions:1. Use the SQL
tools to thoroughly check and correct your do2. Upgrade your mysqlFor the
"blocked" demons, try doing a console status of them.Best regards,KernSent
from Samsung tablet.
-------- Original message --------From: Phil Stracchino <ph...@caerllewys.net>
Date: 6/21/20 15:09 (GMT+01:00) To: bacula-devel@lists.sourceforge.net
Subject: Re: [Bacula-devel] Hung jobs (was Re: Bacula Release 9.6.5) On
2020-06-20 14:33, Phil Stracchino wrote:> OK, two days with zero hung jobs. I
am proceeding with re-upgrading> ONLY the Director (well, and that host's FD)
to 9.6.5.That got me three successful jobs, one failed, and two hung. Here's
thefailure:21-Jun 04:30 minbar-dir JobId 25026: Fatal error:
sql_create.c:968Create db File+record INSERT INTO
File(FileIndex,JobId,PathId,FilenameId,LStat,MD5,DeltaSeq)+VALUES
(2,25026,122083,109,'R0AAQAM E EHt H A A -B H IA F Be7wj5 BciQ9iBciQ9i A
A+C','0',0) failed. ERR=Deadlock found when trying to get lock;
tryrestarting+transaction21-Jun 04:30 minbar-dir JobId 25026: Fatal
error:catreq.c:513 Attribute+create error: ERR=sql_create.c:968 Create db File
record INSERT INTO File+(FileIndex,JobId,PathId,FilenameId,LStat,MD5,DeltaSeq)
VALUES+(2,25026,122083,109,'R0AAQAM E EHt H A A -B H IA F Be7wj5 BciQ9i BciQ9i
A A+C','0',0) failed. ERR=Deadlock found when trying to get lock;
tryrestarting+transaction21-Jun 04:30 asgard-fd JobId 25026: Error: bsock.c:383
Writeerror+sending 79 bytes to Storage
daemon:asgard.caerllewys.net:9103:ERR=Broken pipeOnce again it is somehow
creating a local commit conflict on thecluster. I CAN configure Bacula to send
all transactions to a singlenode of the cluster instead of load-balancing them;
however, BaculaSHOULD be detecting reported deadlocks and retrying on its own.
Thereare now several different MySQL synchronous¹-clustering technologies inthe
market, principally Galera and Oracle's MySQL Cluster or whateverthey're
calling it this week, they aren't difficult to support, and anenterprise backup
tool really should properly support them.¹ Well, OK, Codership likes to use
the phrase "virtually synchronous",by which they mean that Galera replication
tries to ensure that commitsare synchronous modulo network latency. Oracle
with its Paxos-basedalgorithm disparages Galera while making vague promises
about eventualconsistency.Here's the state of the Director:(gdb) attach
5160Attaching to process 5160[New LWP 5753][New LWP 5754][New LWP 17466][New
LWP 17470][New LWP 24466][New LWP 26300][Thread debugging using libthread_db
enabled]Using host libthread_db library
"/lib64/libthread_db.so.1".0x00007f3e6bc044c5 in nanosleep () from
/lib64/libpthread.so.0(gdb) thread apply all btThread 7 (Thread 0x7f3e437fe700
(LWP 26300)):#0 0x00007f3e6bc03dfc in read () from /lib64/libpthread.so.0#1
0x00007f3e6bc57ed3 in BSOCKCORE::socketRead (this=0x7f3e64000d78,len=4,
buf=0x7f3e437fddf4, fd=<optimized out>) at ../lib/bsockcore.h:202#2
BSOCKCORE::read_nbytes (nbytes=<optimized out>, ptr=0x7f3e437fddf4">\177",
this=<optimized out>) at bsockcore.c:1144#3 BSOCKCORE::read_nbytes
(this=0x7f3e64000d78, ptr=<optimized out>,nbytes=4) at bsockcore.c:1130#4
0x00007f3e6bc313bd in BSOCK::recv (this=this@entry=0x7f3e64000d78)at
bsock.c:441#5 0x0000556eddfd86d6 in handle_UA_client_request
(arg=0x7f3e64000d78)at ua_server.c:144#6 0x00007f3e6bc665b5 in workq_server
(arg=0x556ede01b9c0 <ua_workq>)at workq.c:372#7 0x00007f3e6bbf9ea7 in
start_thread () from /lib64/libpthread.so.0#8 0x00007f3e6b8b8c6f in clone ()
from /lib64/libc.so.6Thread 6 (Thread 0x7f3e597fa700 (LWP 24466)):#0
0x00007f3e6bc044c5 in nanosleep () from /lib64/libpthread.so.0#1
0x00007f3e6bc2d4e6 in bmicrosleep (sec=sec@entry=2,usec=usec@entry=0) at
bsys.c:192#2 0x0000556eddfa3d92 in jobq_server (arg=0x556ede01b6a0
<job_queue>)at jobq.c:616#3 0x00007f3e6bbf9ea7 in start_thread () from
/lib64/libpthread.so.0#4 0x00007f3e6b8b8c6f in clone () from
/lib64/libc.so.6Thread 5 (Thread 0x7f3e59ffb700 (LWP 17470)):#0
0x00007f3e6bc03dfc in read () from /lib64/libpthread.so.0#1 0x00007f3e6bc57ed3
in BSOCKCORE::socketRead (this=0x7f3e48006778,len=4, buf=0x7f3e59ffa9b4,
fd=<optimized out>) at ../lib/bsockcore.h:202#2 BSOCKCORE::read_nbytes
(nbytes=<optimized out>, ptr=0x7f3e59ffa9b4"\372\377\377\377", this=<optimized
out>) at bsockcore.c:1144#3 BSOCKCORE::read_nbytes (this=0x7f3e48006778,
ptr=<optimized out>,nbytes=4) at bsockcore.c:1130#4 0x00007f3e6bc313bd in
BSOCK::recv (this=this@entry=0x7f3e48006778)at bsock.c:441#5
0x0000556eddf9a6e7 in bget_dirmsg (bs=bs@entry=0x7f3e48006778) atgetmsg.c:150#6
0x0000556eddf88d78 in wait_for_job_termination(jcr=jcr@entry=0x556edf8d7a88,
timeout=timeout@entry=0) at backup.c:685#7 0x0000556eddf8b009 in do_backup
(jcr=jcr@entry=0x556edf8d7a88) atbackup.c:633#8 0x0000556eddf9d318 in
job_thread (arg=0x556edf8d7a88) at job.c:453#9 0x0000556eddfa37fb in
jobq_server (arg=0x556ede01b6a0 <job_queue>)at jobq.c:468#10 0x00007f3e6bbf9ea7
in start_thread () from /lib64/libpthread.so.0#11 0x00007f3e6b8b8c6f in clone
() from /lib64/libc.so.6Thread 4 (Thread 0x7f3e5affd700 (LWP 17466)):#0
0x00007f3e6bc03dfc in read () from /lib64/libpthread.so.0#1 0x00007f3e6bc57ed3
in BSOCKCORE::socketRead (this=0x7f3e50008e58,len=4, buf=0x7f3e5affc9b4,
fd=<optimized out>) at ../lib/bsockcore.h:202#2 BSOCKCORE::read_nbytes
(nbytes=<optimized out>, ptr=0x7f3e5affc9b4"\372\377\377\377", this=<optimized
out>) at bsockcore.c:1144#3 BSOCKCORE::read_nbytes (this=0x7f3e50008e58,
ptr=<optimized out>,nbytes=4) at bsockcore.c:1130#4 0x00007f3e6bc313bd in
BSOCK::recv (this=this@entry=0x7f3e50008e58)at bsock.c:441#5
0x0000556eddf9a6e7 in bget_dirmsg (bs=bs@entry=0x7f3e50008e58) atgetmsg.c:150#6
0x0000556eddf88d78 in wait_for_job_termination(jcr=jcr@entry=0x556edf8cc618,
timeout=timeout@entry=0) at backup.c:685#7 0x0000556eddf8b009 in do_backup
(jcr=jcr@entry=0x556edf8cc618) atbackup.c:633#8 0x0000556eddf9d318 in
job_thread (arg=0x556edf8cc618) at job.c:453#9 0x0000556eddfa37fb in
jobq_server (arg=0x556ede01b6a0 <job_queue>)at jobq.c:468#10 0x00007f3e6bbf9ea7
in start_thread () from /lib64/libpthread.so.0#11 0x00007f3e6b8b8c6f in clone
() from /lib64/libc.so.6Thread 3 (Thread 0x7f3e6990e700 (LWP 5754)):#0
0x00007f3e6bc00878 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
from/lib64/libpthread.so.0#1 0x00007f3e6bc65be9 in watchdog_thread
(arg=<optimized out>) atwatchdog.c:299#2 0x00007f3e6bbf9ea7 in start_thread ()
from /lib64/libpthread.so.0#3 0x00007f3e6b8b8c6f in clone () from
/lib64/libc.so.6Thread 2 (Thread 0x7f3e6a10f700 (LWP 5753)):#0
0x00007f3e6b8b09d3 in select () from /lib64/libc.so.6#1 0x00007f3e6bc30a38 in
bnet_thread_server(addrs=addrs@entry=0x556edf86e718,
max_clients=20,client_wq=client_wq@entry=0x556ede01b9c0
<ua_workq>,handle_client_request=handle_client_request@entry=0x556eddfd8650<handle_UA_client_request(void*)>)
at bnet_server.c:166--Type <RET> for more, q to quit, c to continue without
paging--c#2 0x0000556eddfd8296 in connect_thread (arg=0x556edf86e718)
atua_server.c:85#3 0x00007f3e6bbf9ea7 in start_thread () from
/lib64/libpthread.so.0#4 0x00007f3e6b8b8c6f in clone () from
/lib64/libc.so.6Thread 1 (Thread 0x7f3e6abd20c0 (LWP 5160)):#0
0x00007f3e6bc044c5 in nanosleep () from /lib64/libpthread.so.0#1
0x00007f3e6bc2d4e6 in bmicrosleep (sec=sec@entry=60,usec=usec@entry=0) at
bsys.c:192#2 0x0000556eddfb00d4 in
wait_for_next_job(one_shot_job_to_run=<optimized out>) at scheduler.c:121#3
0x0000556eddf825f5 in main (argc=<optimized out>, argv=<optimizedout>) at
dird.c:387And here's the state of one of the hung clients (the other is the
Fedoraclient that I don't have a debug build on):(gdb) attach 2823Attaching to
process 2823[New LWP 2830][New LWP 7909][New LWP 7910][Thread debugging using
libthread_db enabled]Using host libthread_db library
"/lib64/libthread_db.so.1".0x00007f3f37d87123 in select () from
/lib64/libc.so.6(gdb) thread apply all btThread 4 (Thread 0x7f3f36510700 (LWP
7910)):#0 0x00007f3f37d87123 in select () from /lib64/libc.so.6#1
0x00007f3f38136242 in fd_wait_data (fd=6, mode=<optimized
out>,mode@entry=WAIT_READ, sec=sec@entry=5, msec=msec@entry=0) at bsys.c:1203#2
0x00007f3f3815fadb in
BSOCKCORE::wait_data_intr(this=this@entry=0x7f3f30000e18, sec=sec@entry=5,
msec=msec@entry=0) atbsockcore.c:875#3 0x000055d4f367de98 in
sd_heartbeat_thread (arg=0x7f3f2c000ed8) atheartbeat.c:69#4 0x00007f3f38100057
in start_thread () from /lib64/libpthread.so.0#5 0x00007f3f37d8f6cf in clone
() from /lib64/libc.so.6Thread 3 (Thread 0x7f3f354c7700 (LWP 7909)):#0
0x00007f3f3810a4ef in write () from /lib64/libpthread.so.0#1
0x00007f3f3816021f in BSOCKCORE::socketWrite (this=0x7f3f2c005458,len=35938,
buf=0x7f3f2c0bb3bc, fd=<optimized out>) at ../lib/bsockcore.h:203#2
BSOCKCORE::write_nbytes (nbytes=<optimized out>, ptr=0x7f3f2c0bb3bc"@",
this=<optimized out>) at bsockcore.c:1079#3 BSOCKCORE::write_nbytes
(this=this@entry=0x7f3f2c005458,ptr=<optimized out>, nbytes=35938) at
bsockcore.c:1064#4 0x00007f3f3813a878 in BSOCK::write_nbytes
(this=0x7f3f2c005458,ptr=<optimized out>, nbytes=35938) at bsock.c:831#5
0x00007f3f3813977a in BSOCK::send (aflags=0, this=0x7f3f2c005458)
atbsock.c:368#6 BSOCK::send (this=this@entry=0x7f3f2c005458,
aflags=aflags@entry=0)at bsock.c:249#7 0x000055d4f36730d1 in BSOCK::send
(this=0x7f3f2c005458) at../lib/bsock.h:75#8 process_and_send_data (bctx=...)
at backup.c:845#9 0x000055d4f36752d0 in send_data (stream=<optimized out>,
bctx=...)at backup.c:655#10 save_file (jcr=0x7f3f2c000ed8,
ff_pkt=0x7f3f2c001558,top_level=<optimized out>) at backup.c:502#11
0x00007f3f381ae228 in find_one_file (jcr=<optimized out>,ff_pkt=0x7f3f2c001558,
handle_file=0x7f3f381acb40 <our_callback(JCR*,FF_PKT*, bool)>,
fname=0x7f3f2c0044c8 "/home/alaric/.moonchildproductions/pale
moon/alaric/adblockplus/patterns-backup3.ini",parent_device=2304,
top_level=<optimized out>) at find_one.c:542#12 0x00007f3f381aee33 in
find_one_file (jcr=<optimized out>,ff_pkt=0x7f3f2c001558,
handle_file=<optimized out>, fname=0x7f3f2c0049e8"/home/alaric/.moonchild
productions/pale moon/alaric/adblockplus",parent_device=<optimized out>,
top_level=<optimized out>) at find_one.c:768#13 0x00007f3f381aee33 in
find_one_file (jcr=<optimized out>,ff_pkt=0x7f3f2c001558,
handle_file=<optimized out>, fname=0x7f3f2c0a9b68"/home/alaric/.moonchild
productions/pale moon/alaric",parent_device=<optimized out>,
top_level=<optimized out>) at find_one.c:768#14 0x00007f3f381aee33 in
find_one_file (jcr=<optimized out>,ff_pkt=0x7f3f2c001558,
handle_file=<optimized out>, fname=0x7f3f2c003f78"/home/alaric/.moonchild
productions/pale moon",parent_device=<optimized out>, top_level=<optimized
out>) at find_one.c:768#15 0x00007f3f381aee33 in find_one_file (jcr=<optimized
out>,ff_pkt=0x7f3f2c001558, handle_file=<optimized out>,
fname=0x7f3f2c0034b8"/home/alaric/.moonchild productions",
parent_device=<optimized out>,top_level=<optimized out>) at find_one.c:768#16
0x00007f3f381aee33 in find_one_file (jcr=<optimized out>,ff_pkt=0x7f3f2c001558,
handle_file=<optimized out>, fname=0x7f3f2c0077f8"/home/alaric",
parent_device=<optimized out>, top_level=<optimizedout>) at find_one.c:768#17
0x00007f3f381aee33 in find_one_file (jcr=<optimized out>,ff_pkt=0x7f3f2c001558,
handle_file=<optimized out>, fname=0x7f3f2c007298"/home",
parent_device=<optimized out>, top_level=<optimized out>) atfind_one.c:768#18
0x00007f3f381aee33 in find_one_file
(jcr=jcr@entry=0x7f3f2c000ed8,ff_pkt=ff_pkt@entry=0x7f3f2c001558,handle_file=handle_file@entry=0x7f3f381acb40
<our_callback(JCR*,FF_PKT*, bool)>, fname=fname@entry=0x7f3f2c002418
"/",parent_device=parent_device@entry=18446744073709551615,top_level=top_level@entry=true)
at find_one.c:768#19 0x00007f3f381abdf7 in find_files
(jcr=jcr@entry=0x7f3f2c000ed8,ff=0x7f3f2c001558,
file_save=file_save@entry=0x55d4f3674500<save_file(JCR*, FF_PKT*, bool)>,
plugin_save=0x55d4f3679440<plugin_save(JCR*, FF_PKT*, bool)>) at find.c:186#20
0x000055d4f3672c06 in
blast_data_to_storage_daemon(jcr=jcr@entry=0x7f3f2c000ed8, addr=addr@entry=0x0)
at backup.c:166#21 0x000055d4f368343f in backup_cmd (jcr=0x7f3f2c000ed8) at
job.c:2522#22 0x000055d4f3684658 in handle_director_request
(dir=0x55d4f52f8298)at job.c:343#23 handle_connection_request
(caller=0x55d4f52f8298) at job.c:503#24 0x00007f3f3816f0ac in workq_server
(arg=0x55d4f36aaca0 <dir_workq>)at workq.c:372#25 0x00007f3f38100057 in
start_thread () from /lib64/libpthread.so.0#26 0x00007f3f37d8f6cf in clone ()
from /lib64/libc.so.6Thread 2 (Thread 0x7f3f36d11700 (LWP 2830)):#0
0x00007f3f38106d08 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
from/lib64/libpthread.so.0#1 0x00007f3f3816e6d0 in watchdog_thread
(arg=<optimized out>) atwatchdog.c:299#2 0x00007f3f38100057 in start_thread ()
from /lib64/libpthread.so.0#3 0x00007f3f37d8f6cf in clone () from
/lib64/libc.so.6--Type <RET> for more, q to quit, c to continue without
paging--cThread 1 (Thread 0x7f3f377ba740 (LWP 2823)):#0 0x00007f3f37d87123 in
select () from /lib64/libc.so.6#1 0x00007f3f38137c60 in bnet_thread_server
(addrs=0x55d4f52f65b8,max_clients=20, client_wq=0x55d4f36aaca0
<dir_workq>,handle_client_request=0x55d4f3683c40
<handle_connection_request(void*)>)at bnet_server.c:166#2 0x000055d4f3671777
in main (argc=<optimized out>, argv=<optimizedout>) at filed.c:277-- Phil
Stracchino Babylon Communications ph...@caerllewys.net p...@co.ordinate.org
Landline: +1.603.293.8485 Mobile:
+1.603.998.6958_______________________________________________Bacula-devel
mailing
listBacula-devel@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/bacula-devel
_______________________________________________
Bacula-devel mailing list
Bacula-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-devel