Hello,
As I mentioned, I am having the same problem the original poster
mentioned even though he later recanted.
I run all my jobs using the Accurate mode and on particularly large
(possibly number of files rather than size) jobs, the Status of the
Director will block bconsole until the database activity to start the
Accurate job completes.
Output of bconsole:
----------------------------------------------------------------------------------------
*s
Status available for:
1: Director
2: Storage
3: Client
4: All
Select daemon type for status (1-4): 1
lawson-dir Version: 5.0.1 (24 February 2010) x86_64-unknown-linux-gnu
redhat
Daemon started 23-Apr-10 10:54, 0 Jobs run since started.
Heap: heap=1,200,128 smbytes=954,678 max_bytes=954,934 bufs=6,590
max_bufs=6,611
Scheduled Jobs:
Level Type Pri Scheduled Name Volume
===================================================================================
------------------------------------------------------------------------------------------
(i.e. nothing more, no prompt, until database queries finish for
Accurate job)
Here is a traceback run seconds after the above bconsole state:
[Thread debugging using libthread_db enabled]
[New Thread 0x2ad78108c280 (LWP 18020)]
[New Thread 0x4b1ec940 (LWP 20980)]
[New Thread 0x4a7eb940 (LWP 20977)]
[New Thread 0x493e9940 (LWP 20971)]
[New Thread 0x47fe7940 (LWP 20967)]
[New Thread 0x46be5940 (LWP 20963)]
[New Thread 0x457e3940 (LWP 20961)]
[New Thread 0x44de2940 (LWP 20957)]
[New Thread 0x439e0940 (LWP 20953)]
[New Thread 0x4186f940 (LWP 18137)]
[New Thread 0x42fdf940 (LWP 18026)]
[New Thread 0x425de940 (LWP 18025)]
0x000000374b80dfe1 in nanosleep () from /lib64/libpthread.so.0
$1 = "lawson-dir", '\0' <repeats 19 times>
$2 = 0x1f5e9528 "bacula-dir"
$3 = 0x1f5e9568 "/opt/bacula/bin/"
$4 = 0x1f6e6148 "MySQL"
$5 = 0x2ad780c5f76e "5.0.1 (24 February 2010)"
$6 = 0x2ad780c5f787 "x86_64-unknown-linux-gnu"
$7 = 0x2ad780c5f7a0 "redhat"
$8 = 0x2ad780c5f465 ""
$9 = "lawson.geo.berkeley.edu", '\0' <repeats 26 times>
#0 0x000000374b80dfe1 in nanosleep () from /lib64/libpthread.so.0
#1 0x00002ad780c30b8b in bmicrosleep (sec=60, usec=0) at bsys.c:61
#2 0x000000000042cd7a in wait_for_next_job (
one_shot_job_to_run=<value optimized out>) at scheduler.c:131
#3 0x000000000040d70c in main (argc=0, argv=0x7fff39454778) at dird.c:338
Thread 12 (Thread 0x425de940 (LWP 18025)):
#0 0x000000374b0cced2 in select () from /lib64/libc.so.6
#1 0x00002ad780c32fd7 in bnet_thread_server (addrs=0x1f5eb3a8,
max_clients=20, client_wq=0x66f8e0,
handle_client_request=0x444a90 <handle_UA_client_request>)
at bnet_server.c:161
#2 0x0000000000444a81 in connect_thread (arg=0x1f5eb3a8) at ua_server.c:82
#3 0x000000374b806617 in start_thread () from /lib64/libpthread.so.0
#4 0x000000374b0d3c2d in clone () from /lib64/libc.so.6
Thread 11 (Thread 0x42fdf940 (LWP 18026)):
#0 0x000000374b80af70 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
from /lib64/libpthread.so.0
#1 0x00002ad780c57282 in watchdog_thread (arg=<value optimized out>)
at watchdog.c:308
#2 0x000000374b806617 in start_thread () from /lib64/libpthread.so.0
#3 0x000000374b0d3c2d in clone () from /lib64/libc.so.6
Thread 10 (Thread 0x4186f940 (LWP 18137)):
#0 0x000000374b80ad09 in pthread_cond_wait@@GLIBC_2.3.2 ()
from /lib64/libpthread.so.0
#1 0x00002ad780c4c956 in rwl_writelock (rwl=0x1f6eb3d8) at rwlock.c:242
#2 0x00002ad7805fc837 in _db_lock (file=0x2ad78060e084 "sql_get.c",
line=128,
mdb=0x15) at sql.c:385
#3 0x00002ad780603192 in db_get_pool_record (jcr=0x1f6ffec8, mdb=0x15,
pdbr=0x4186e9c0) at sql_get.c:607
#4 0x0000000000437293 in complete_jcr_for_job (jcr=0x1f6ffec8,
job=0x1f5ff728, pool=0x1f68ad58) at ua_output.c:725
#5 0x0000000000446277 in list_scheduled_jobs (ua=0x1f701a18)
at ua_status.c:464
#6 0x0000000000446451 in do_director_status (ua=0x1f701a18) at
ua_status.c:337
#7 0x00000000004467fd in status_cmd (ua=0x1f701a18, cmd=<value
optimized out>)
at ua_status.c:187
#8 0x000000000042f86d in do_a_command (ua=0x1f701a18) at ua_cmds.c:240
#9 0x0000000000444b5a in handle_UA_client_request (arg=<value optimized
out>)
at ua_server.c:146
#10 0x00002ad780c57ace in workq_server (arg=<value optimized out>)
at workq.c:346
#11 0x000000374b806617 in start_thread () from /lib64/libpthread.so.0
#12 0x000000374b0d3c2d in clone () from /lib64/libc.so.6
Thread 9 (Thread 0x439e0940 (LWP 20953)):
#0 0x000000374b80dfe1 in nanosleep () from /lib64/libpthread.so.0
#1 0x00002ad780c30b8b in bmicrosleep (sec=2, usec=0) at bsys.c:61
#2 0x0000000000422fc9 in jobq_server (arg=<value optimized out>) at
jobq.c:595
#3 0x000000374b806617 in start_thread () from /lib64/libpthread.so.0
#4 0x000000374b0d3c2d in clone () from /lib64/libc.so.6
Thread 8 (Thread 0x44de2940 (LWP 20957)):
#0 0x000000374b80d73b in read () from /lib64/libpthread.so.0
#1 0x0000003c54651bb8 in vio_read ()
from /usr/lib64/mysql/libmysqlclient_r.so.15
#2 0x0000003c54651c13 in vio_read_buff ()
from /usr/lib64/mysql/libmysqlclient_r.so.15
#3 0x0000003c54652d98 in ?? () from /usr/lib64/mysql/libmysqlclient_r.so.15
#4 0x0000003c54653199 in my_net_read ()
from /usr/lib64/mysql/libmysqlclient_r.so.15
#5 0x0000003c5464cbaf in cli_safe_read ()
from /usr/lib64/mysql/libmysqlclient_r.so.15
#6 0x0000003c5464dc19 in ?? () from /usr/lib64/mysql/libmysqlclient_r.so.15
#7 0x0000003c5464c20e in mysql_real_query ()
from /usr/lib64/mysql/libmysqlclient_r.so.15
#8 0x00002ad7805fa33c in db_sql_query (mdb=0x1f71b958,
query=0x1f7232f0 "SELECT Path.Path, Filename.Name, Temp.FileIndex,
Temp.JobI
d, LStat, MD5 FROM ( SELECT FileId, Job.JobId AS JobId, FileIndex,
File.PathId A
S PathId, File.FilenameId AS FilenameId, LStat, MD5 FROM Job, "...,
result_handler=0x40fe10 <accurate_list_handler>, ctx=0x1f7061b8)
at mysql.c:337
#9 0x00002ad780604440 in db_get_file_list (jcr=<value optimized out>,
mdb=0x1f71b958, jobids=0x1f71b1a0 "101430,101929,102786",
result_handler=0x40fe10 <accurate_list_handler>, ctx=0x1f7061b8)
at sql_get.c:1132
#10 0x00000000004104d5 in send_accurate_current_files (jcr=0x1f7061b8)
at backup.c:298
#11 0x00000000004114ed in do_backup (jcr=0x1f7061b8) at backup.c:445
#12 0x000000000042162e in job_thread (arg=<value optimized out>) at
job.c:314
#13 0x00000000004227d4 in jobq_server (arg=<value optimized out>) at
jobq.c:450
#14 0x000000374b806617 in start_thread () from /lib64/libpthread.so.0
#15 0x000000374b0d3c2d in clone () from /lib64/libc.so.6
Thread 7 (Thread 0x457e3940 (LWP 20961)):
#0 0x000000374b80d73b in read () from /lib64/libpthread.so.0
#1 0x00002ad780c321a6 in read_nbytes (bsock=0x1f70ec28, ptr=0x457e2d84
"8",
nbytes=4) at bnet.c:80
#2 0x00002ad780c35c1f in BSOCK::recv (this=0x1f70ec28) at bsock.c:451
#3 0x000000000041c321 in bget_dirmsg (bs=0x1f70ec28) at getmsg.c:137
#4 0x000000000042782e in msg_thread (arg=0x1f7061b8) at msgchan.c:390
#5 0x000000374b806617 in start_thread () from /lib64/libpthread.so.0
#6 0x000000374b0d3c2d in clone () from /lib64/libc.so.6
Thread 6 (Thread 0x46be5940 (LWP 20963)):
#0 0x000000374b80ad09 in pthread_cond_wait@@GLIBC_2.3.2 ()
from /lib64/libpthread.so.0
#1 0x00002ad780c4c956 in rwl_writelock (rwl=0x1f6eb3d8) at rwlock.c:242
#2 0x00002ad7805fc837 in _db_lock (file=0x2ad780611c1f "sql_update.c",
line=128, mdb=0x14) at sql.c:385
#3 0x00002ad780606094 in db_update_job_start_record (jcr=0x1f719328,
mdb=0x1f6eb3c8, jr=0x1f719768) at sql_update.c:112
#4 0x000000000042130c in job_thread (arg=<value optimized out>) at
job.c:287
#5 0x00000000004227d4 in jobq_server (arg=<value optimized out>) at
jobq.c:450
#6 0x000000374b806617 in start_thread () from /lib64/libpthread.so.0
#7 0x000000374b0d3c2d in clone () from /lib64/libc.so.6
Thread 5 (Thread 0x47fe7940 (LWP 20967)):
#0 0x000000374b80dfe1 in nanosleep () from /lib64/libpthread.so.0
#1 0x00002ad780c30b8b in bmicrosleep (sec=2, usec=0) at bsys.c:61
#2 0x0000000000422fc9 in jobq_server (arg=<value optimized out>) at
jobq.c:595
#3 0x000000374b806617 in start_thread () from /lib64/libpthread.so.0
#4 0x000000374b0d3c2d in clone () from /lib64/libc.so.6
Thread 4 (Thread 0x493e9940 (LWP 20971)):
#0 0x000000374b80dfe1 in nanosleep () from /lib64/libpthread.so.0
#1 0x00002ad780c30b8b in bmicrosleep (sec=2, usec=0) at bsys.c:61
#2 0x0000000000422fc9 in jobq_server (arg=<value optimized out>) at
jobq.c:595
#3 0x000000374b806617 in start_thread () from /lib64/libpthread.so.0
#4 0x000000374b0d3c2d in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x4a7eb940 (LWP 20977)):
#0 0x000000374b80d73b in read () from /lib64/libpthread.so.0
#1 0x0000003c54651bb8 in vio_read ()
from /usr/lib64/mysql/libmysqlclient_r.so.15
#2 0x0000003c54651c13 in vio_read_buff ()
from /usr/lib64/mysql/libmysqlclient_r.so.15
#3 0x0000003c54652d98 in ?? () from /usr/lib64/mysql/libmysqlclient_r.so.15
#4 0x0000003c54653199 in my_net_read ()
from /usr/lib64/mysql/libmysqlclient_r.so.15
#5 0x0000003c5464cbaf in cli_safe_read ()
from /usr/lib64/mysql/libmysqlclient_r.so.15
#6 0x0000003c5464dc19 in ?? () from /usr/lib64/mysql/libmysqlclient_r.so.15
#7 0x0000003c5464c20e in mysql_real_query ()
from /usr/lib64/mysql/libmysqlclient_r.so.15
#8 0x00002ad7805fbda0 in UpdateDB (file=0x2ad780611c1f "sql_update.c",
line=123, jcr=0x4000, mdb=0xffffffffffffffff,
cmd=0x3232323330313d64 <Address 0x3232323330313d64 out of bounds>)
at sql.c:298
#9 0x00002ad780606164 in db_update_job_start_record (jcr=0x1f714728,
mdb=0x1f6eb3c8, jr=0x1f714b68) at sql_update.c:123
#10 0x0000000000411211 in do_backup (jcr=0x1f714728) at backup.c:333
#11 0x000000000042162e in job_thread (arg=<value optimized out>) at
job.c:314
#12 0x00000000004227d4 in jobq_server (arg=<value optimized out>) at
jobq.c:450
#13 0x000000374b806617 in start_thread () from /lib64/libpthread.so.0
#14 0x000000374b0d3c2d in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x4b1ec940 (LWP 20980)):
#0 0x000000374b80ad09 in pthread_cond_wait@@GLIBC_2.3.2 ()
from /lib64/libpthread.so.0
#1 0x00002ad780c4c956 in rwl_writelock (rwl=0x1f6eb3d8) at rwlock.c:242
#2 0x00002ad7805fc837 in _db_lock (file=0x2ad78060bfc0 "sql_create.c",
line=128, mdb=0x13) at sql.c:385
#3 0x00002ad7805fd97d in db_create_client_record (jcr=0x1f725068,
mdb=0x13,
cr=0x4b1eba20) at sql_create.c:485
#4 0x000000000041f19f in get_or_create_client_record (jcr=0x1f725068)
at job.c:867
#5 0x00000000004218ce in setup_job (jcr=0x1f725068) at job.c:162
#6 0x0000000000421d1b in run_job (jcr=0x1f6eb434) at job.c:89
#7 0x0000000000441580 in run_cmd (ua=0x1f7237a8, cmd=<value optimized out>)
at ua_run.c:180
#8 0x000000000042f86d in do_a_command (ua=0x1f7237a8) at ua_cmds.c:240
#9 0x0000000000444b5a in handle_UA_client_request (arg=<value optimized
out>)
at ua_server.c:146
#10 0x00002ad780c57ace in workq_server (arg=<value optimized out>)
at workq.c:346
#11 0x000000374b806617 in start_thread () from /lib64/libpthread.so.0
#12 0x000000374b0d3c2d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x2ad78108c280 (LWP 18020)):
#0 0x000000374b80dfe1 in nanosleep () from /lib64/libpthread.so.0
#1 0x00002ad780c30b8b in bmicrosleep (sec=60, usec=0) at bsys.c:61
#2 0x000000000042cd7a in wait_for_next_job (
one_shot_job_to_run=<value optimized out>) at scheduler.c:131
#3 0x000000000040d70c in main (argc=0, argv=0x7fff39454778) at dird.c:338
#0 0x000000374b80dfe1 in nanosleep () from /lib64/libpthread.so.0
No symbol table info available.
#1 0x00002ad780c30b8b in bmicrosleep (sec=60, usec=0) at bsys.c:61
61 stat = nanosleep(&timeout, NULL);
Current language: auto; currently c++
timeout = {tv_sec = 60, tv_nsec = 0}
tv = {tv_sec = 140734154229024, tv_usec = 1272046151}
tz = {tz_minuteswest = 0, tz_dsttime = 110}
stat = <value optimized out>
#2 0x000000000042cd7a in wait_for_next_job (
one_shot_job_to_run=<value optimized out>) at scheduler.c:131
131 bmicrosleep(next_check_secs, 0); /* recheck once per minute */
jcr = <value optimized out>
job = (JOB *) 0x0
run = <value optimized out>
now = <value optimized out>
next_job = <value optimized out>
first = false
#3 0x000000000040d70c in main (argc=0, argv=0x7fff39454778) at dird.c:338
338 while ( (jcr = wait_for_next_job(runjob)) ) {
ch = <value optimized out>
jcr = (JCR *) 0x4
no_signals = false
test_config = false
uid = 0x0
gid = 0x0
mode = <value optimized out>
#0 0x0000000000000000 in ?? ()
No symbol table info available.
#0 0x0000000000000000 in ?? ()
No symbol table info available.
#0 0x0000000000000000 in ?? ()
No symbol table info available.
#0 0x0000000000000000 in ?? ()
No symbol table info available.
Can anyone shed some light on why my console is blocking at this point?
Maybe I'm misunderstanding the responses to the original post, but it
sounds like the Directory shouldn't be blocking.
Note that additional consoles spawned can connect to the Director at
this point, but they too 'hang' when a Status of Director is requested.
thanks!
Stephen
On 04/23/2010 02:32 AM, Craig Ringer wrote:
> On 23/04/10 16:52, Graham Keeling wrote:
>> On Fri, Apr 23, 2010 at 10:31:49AM +0200, Eric Bollengier wrote:
>>> For normal Accurate backup, it uses dedicated connection since the
>>> beginning,
>>> so it's shouldn't block commands too.
>>
>> I do not think this works because for me, the 'status director' command
>> blocks
>> until the accurate query is complete.
>
> It'd be rather handy to see the output of a:
>
> thread apply all bt
>
> command from gdb attached to the director while it's blocked trying to
> provide status information to a console.
>
> ( If your Bacula was installed from distro packages you may need to
> install additional debug info packages to get any useful output, though. )
>
> --
> Craig Ringer
>
> ------------------------------------------------------------------------------
> _______________________________________________
> Bacula-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/bacula-devel
--
Stephen Thompson Berkeley Seismological Laboratory
[email protected] 215 McCone Hall # 4760
404.538.7077 (phone) University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760
------------------------------------------------------------------------------
_______________________________________________
Bacula-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-devel