Hello,

We've definitely been having less stability with 5.0.1 than with 3.0.3 
in our environment.  I posted last week about two separate but similar 
StorageDaemon crashes.  This weekend we had the Director crash; luckily 
I got tracebacks working in anticipation of another SD crash.  Looks 
like the Director crashed while running some of our regular nightly fulls...

Does this traceback reveal anything?

[Thread debugging using libthread_db enabled]
[New Thread 0x2b2a07e49280 (LWP 17166)]
[New Thread 0x4e83c940 (LWP 25060)]
[New Thread 0x4662f940 (LWP 25059)]
[New Thread 0x45c2e940 (LWP 25058)]
[New Thread 0x43e2b940 (LWP 25056)]
[New Thread 0x42a29940 (LWP 25055)]
[New Thread 0x4342a940 (LWP 25054)]
[New Thread 0x47a31940 (LWP 24923)]
[New Thread 0x4de3b940 (LWP 18781)]
[New Thread 0x4d43a940 (LWP 18780)]
[New Thread 0x4ca39940 (LWP 18779)]
[New Thread 0x4c038940 (LWP 18778)]
[New Thread 0x4b637940 (LWP 18777)]
[New Thread 0x4ac36940 (LWP 18776)]
[New Thread 0x4a235940 (LWP 18775)]
[New Thread 0x49834940 (LWP 18774)]
[New Thread 0x48e33940 (LWP 18773)]
[New Thread 0x47030940 (LWP 5301)]
[New Thread 0x48432940 (LWP 22437)]
[New Thread 0x4482c940 (LWP 30205)]
[New Thread 0x42028940 (LWP 17186)]
[New Thread 0x41627940 (LWP 17185)]
0x000000374b80dfe1 in nanosleep () from /lib64/libpthread.so.0
$1 = "lawson-dir", '\0' <repeats 19 times>
$2 = 0x17ed7528 "bacula-dir"
$3 = 0x17ed7568 "/opt/bacula/bin/bacula-dir"
$4 = 0x17fd3148 "MySQL"
$5 = 0x2b2a07a1c4fe "5.0.1 (24 February 2010)"
$6 = 0x2b2a07a1c517 "x86_64-unknown-linux-gnu"
$7 = 0x2b2a07a1c530 "redhat"
$8 = 0x2b2a07a1c1f5 ""
$9 = "lawson.geo.berkeley.edu", '\0' <repeats 26 times>
#0  0x000000374b80dfe1 in nanosleep () from /lib64/libpthread.so.0
#1  0x00002b2a079edb4b in bmicrosleep (sec=60, usec=0) at bsys.c:61
#2  0x000000000042cca3 in wait_for_next_job (
     one_shot_job_to_run=<value optimized out>) at scheduler.c:178
#3  0x000000000040d64c in main (argc=0, argv=0x7fff37f5c3b8) at dird.c:338

Thread 22 (Thread 0x41627940 (LWP 17185)):
#0  0x000000374b0cced2 in select () from /lib64/libc.so.6
#1  0x00002b2a079eff97 in bnet_thread_server (addrs=0x17ed93a8,
     max_clients=20, client_wq=0x66f880,
     handle_client_request=0x444a50 <handle_UA_client_request>)
     at bnet_server.c:161
#2  0x0000000000444a4c in connect_thread (arg=0x17ed93a8) at ua_server.c:82
#3  0x000000374b806617 in start_thread () from /lib64/libpthread.so.0
#4  0x000000374b0d3c2d in clone () from /lib64/libc.so.6

Thread 21 (Thread 0x42028940 (LWP 17186)):
#0  0x000000374b80af70 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
    from /lib64/libpthread.so.0
#1  0x00002b2a07a1409d in watchdog_thread (arg=<value optimized out>)
     at watchdog.c:308
#2  0x000000374b806617 in start_thread () from /lib64/libpthread.so.0
#3  0x000000374b0d3c2d in clone () from /lib64/libc.so.6

Thread 20 (Thread 0x4482c940 (LWP 30205)):
#0  0x000000374b80d73b in read () from /lib64/libpthread.so.0
#1  0x00002b2a079ef166 in read_nbytes (bsock=0x180077a8, ptr=0x4482c014 "",
     nbytes=4) at bnet.c:80
#2  0x00002b2a079f2bdf in BSOCK::recv (this=0x180077a8) at bsock.c:451
#3  0x0000000000444ad7 in handle_UA_client_request (arg=<value optimized 
out>)
     at ua_server.c:139
#4  0x00002b2a07a148de in workq_server (arg=<value optimized out>)
     at workq.c:346
#5  0x000000374b806617 in start_thread () from /lib64/libpthread.so.0
#6  0x000000374b0d3c2d in clone () from /lib64/libc.so.6

Thread 19 (Thread 0x48432940 (LWP 22437)):
#0  0x000000374b80d73b in read () from /lib64/libpthread.so.0
#1  0x00002b2a079ef166 in read_nbytes (bsock=0x17fd8088, ptr=0x48432014 "",
     nbytes=4) at bnet.c:80
#2  0x00002b2a079f2bdf in BSOCK::recv (this=0x17fd8088) at bsock.c:451
#3  0x0000000000444ad7 in handle_UA_client_request (arg=<value optimized 
out>)
     at ua_server.c:139
#4  0x00002b2a07a148de in workq_server (arg=<value optimized out>)
     at workq.c:346
#5  0x000000374b806617 in start_thread () from /lib64/libpthread.so.0
#6  0x000000374b0d3c2d in clone () from /lib64/libc.so.6

Thread 18 (Thread 0x47030940 (LWP 5301)):
#0  0x000000374b80dfe1 in nanosleep () from /lib64/libpthread.so.0
#1  0x00002b2a079edb4b in bmicrosleep (sec=2, usec=0) at bsys.c:61
#2  0x0000000000422f69 in jobq_server (arg=<value optimized out>) at 
jobq.c:595
#3  0x000000374b806617 in start_thread () from /lib64/libpthread.so.0
#4  0x000000374b0d3c2d in clone () from /lib64/libc.so.6

Thread 17 (Thread 0x48e33940 (LWP 18773)):
#0  0x000000374b80d73b in read () from /lib64/libpthread.so.0
#1  0x00002b2a079ef166 in read_nbytes (bsock=0x180cf188, ptr=0x48e32c84 "",
     nbytes=4) at bnet.c:80
#2  0x00002b2a079f2bdf in BSOCK::recv (this=0x180cf188) at bsock.c:451
#3  0x000000000041c251 in bget_dirmsg (bs=0x180cf188) at getmsg.c:137
#4  0x000000000040faa4 in wait_for_job_termination (jcr=0x17fdb858, 
timeout=0)
     at backup.c:508
#5  0x000000000041145f in do_backup (jcr=0x17fdb858) at backup.c:456
#6  0x000000000042143e in job_thread (arg=<value optimized out>) at 
job.c:314
#7  0x0000000000422774 in jobq_server (arg=<value optimized out>) at 
jobq.c:450
#8  0x000000374b806617 in start_thread () from /lib64/libpthread.so.0
#9  0x000000374b0d3c2d in clone () from /lib64/libc.so.6

Thread 16 (Thread 0x49834940 (LWP 18774)):
#0  0x000000374b80d73b in read () from /lib64/libpthread.so.0
#1  0x00002b2a079ef166 in read_nbytes (bsock=0x180cf2b8, ptr=0x49833c84 "",
     nbytes=4) at bnet.c:80
#2  0x00002b2a079f2bdf in BSOCK::recv (this=0x180cf2b8) at bsock.c:451
#3  0x000000000041c251 in bget_dirmsg (bs=0x180cf2b8) at getmsg.c:137
#4  0x000000000040faa4 in wait_for_job_termination (jcr=0x1817a788, 
timeout=0)
     at backup.c:508
#5  0x000000000041145f in do_backup (jcr=0x1817a788) at backup.c:456
#6  0x000000000042143e in job_thread (arg=<value optimized out>) at 
job.c:314
#7  0x0000000000422774 in jobq_server (arg=<value optimized out>) at 
jobq.c:450
#8  0x000000374b806617 in start_thread () from /lib64/libpthread.so.0
#9  0x000000374b0d3c2d in clone () from /lib64/libc.so.6

Thread 15 (Thread 0x4a235940 (LWP 18775)):
#0  0x000000374b80d73b in read () from /lib64/libpthread.so.0
#1  0x00002b2a079ef166 in read_nbytes (bsock=0x18163808,
     ptr=0x4a234c84 "\233", nbytes=4) at bnet.c:80
#2  0x00002b2a079f2bdf in BSOCK::recv (this=0x18163808) at bsock.c:451
#3  0x000000000041c251 in bget_dirmsg (bs=0x18163808) at getmsg.c:137
#4  0x000000000040faa4 in wait_for_job_termination (jcr=0x18025e18, 
timeout=0)
     at backup.c:508
#5  0x000000000041145f in do_backup (jcr=0x18025e18) at backup.c:456
#6  0x000000000042143e in job_thread (arg=<value optimized out>) at 
job.c:314
#7  0x0000000000422774 in jobq_server (arg=<value optimized out>) at 
jobq.c:450
#8  0x000000374b806617 in start_thread () from /lib64/libpthread.so.0
#9  0x000000374b0d3c2d in clone () from /lib64/libc.so.6

Thread 14 (Thread 0x4ac36940 (LWP 18776)):
#0  0x000000374b80d73b in read () from /lib64/libpthread.so.0
#1  0x00002b2a079ef166 in read_nbytes (bsock=0x2aaab4018398,
     ptr=0x4ac35c84 "", nbytes=4) at bnet.c:80
#2  0x00002b2a079f2bdf in BSOCK::recv (this=0x2aaab4018398) at bsock.c:451
#3  0x000000000041c251 in bget_dirmsg (bs=0x2aaab4018398) at getmsg.c:137
#4  0x000000000040faa4 in wait_for_job_termination (jcr=0x181725c8, 
timeout=0)
     at backup.c:508
#5  0x000000000041145f in do_backup (jcr=0x181725c8) at backup.c:456
#6  0x000000000042143e in job_thread (arg=<value optimized out>) at 
job.c:314
#7  0x0000000000422774 in jobq_server (arg=<value optimized out>) at 
jobq.c:450
#8  0x000000374b806617 in start_thread () from /lib64/libpthread.so.0
#9  0x000000374b0d3c2d in clone () from /lib64/libc.so.6

Thread 13 (Thread 0x4b637940 (LWP 18777)):
#0  0x000000374b80d73b in read () from /lib64/libpthread.so.0
#1  0x00002b2a079ef166 in read_nbytes (bsock=0x2aaab4008808,
     ptr=0x4b636c84 "", nbytes=4) at bnet.c:80
#2  0x00002b2a079f2bdf in BSOCK::recv (this=0x2aaab4008808) at bsock.c:451
#3  0x000000000041c251 in bget_dirmsg (bs=0x2aaab4008808) at getmsg.c:137
#4  0x000000000040faa4 in wait_for_job_termination (jcr=0x17ff30f8, 
timeout=0)
     at backup.c:508
#5  0x000000000041145f in do_backup (jcr=0x17ff30f8) at backup.c:456
#6  0x000000000042143e in job_thread (arg=<value optimized out>) at 
job.c:314
#7  0x0000000000422774 in jobq_server (arg=<value optimized out>) at 
jobq.c:450
#8  0x000000374b806617 in start_thread () from /lib64/libpthread.so.0
#9  0x000000374b0d3c2d in clone () from /lib64/libc.so.6

Thread 12 (Thread 0x4c038940 (LWP 18778)):
#0  0x000000374b80e4af in waitpid () from /lib64/libpthread.so.0
#1  0x00002b2a07a0b5a6 in signal_handler (sig=11) at signal.c:229
#2  <signal handler called>
#3  0x000000374b80baf0 in pthread_kill () from /lib64/libpthread.so.0
#4  0x000000000041fda0 in cancel_storage_daemon_job (jcr=0x17fd8b28)
     at job.c:515
#5  0x000000000040fc30 in wait_for_job_termination (jcr=0x17fd8b28,
     timeout=180) at backup.c:538
#6  0x000000000041124c in do_backup (jcr=0x17fd8b28) at backup.c:476
#7  0x000000000042143e in job_thread (arg=<value optimized out>) at 
job.c:314
#8  0x0000000000422774 in jobq_server (arg=<value optimized out>) at 
jobq.c:450
#9  0x000000374b806617 in start_thread () from /lib64/libpthread.so.0
#10 0x000000374b0d3c2d in clone () from /lib64/libc.so.6

Thread 11 (Thread 0x4ca39940 (LWP 18779)):
#0  0x000000374b80d73b in read () from /lib64/libpthread.so.0
#1  0x00002b2a079ef166 in read_nbytes (bsock=0x180c7b48, ptr=0x4ca38c84 "",
     nbytes=4) at bnet.c:80
#2  0x00002b2a079f2bdf in BSOCK::recv (this=0x180c7b48) at bsock.c:451
#3  0x000000000041c251 in bget_dirmsg (bs=0x180c7b48) at getmsg.c:137
#4  0x000000000040faa4 in wait_for_job_termination (jcr=0x1800c828, 
timeout=0)
     at backup.c:508
#5  0x000000000041145f in do_backup (jcr=0x1800c828) at backup.c:456
#6  0x000000000042143e in job_thread (arg=<value optimized out>) at 
job.c:314
#7  0x0000000000422774 in jobq_server (arg=<value optimized out>) at 
jobq.c:450
#8  0x000000374b806617 in start_thread () from /lib64/libpthread.so.0
#9  0x000000374b0d3c2d in clone () from /lib64/libc.so.6

Thread 10 (Thread 0x4d43a940 (LWP 18780)):
#0  0x000000374b80dfe1 in nanosleep () from /lib64/libpthread.so.0
#1  0x00002b2a079edb4b in bmicrosleep (sec=2, usec=0) at bsys.c:61
#2  0x0000000000422f69 in jobq_server (arg=<value optimized out>) at 
jobq.c:595
#3  0x000000374b806617 in start_thread () from /lib64/libpthread.so.0
#4  0x000000374b0d3c2d in clone () from /lib64/libc.so.6

Thread 9 (Thread 0x4de3b940 (LWP 18781)):
#0  0x000000374b80d73b in read () from /lib64/libpthread.so.0
#1  0x00002b2a079ef166 in read_nbytes (bsock=0x1806ab68, ptr=0x4de3ac84 
"¿",
     nbytes=4) at bnet.c:80
#2  0x00002b2a079f2bdf in BSOCK::recv (this=0x1806ab68) at bsock.c:451
#3  0x000000000041c251 in bget_dirmsg (bs=0x1806ab68) at getmsg.c:137
#4  0x000000000040faa4 in wait_for_job_termination (jcr=0x18177f68, 
timeout=0)
     at backup.c:508
#5  0x000000000041145f in do_backup (jcr=0x18177f68) at backup.c:456
#6  0x000000000042143e in job_thread (arg=<value optimized out>) at 
job.c:314
#7  0x0000000000422774 in jobq_server (arg=<value optimized out>) at 
jobq.c:450
#8  0x000000374b806617 in start_thread () from /lib64/libpthread.so.0
#9  0x000000374b0d3c2d in clone () from /lib64/libc.so.6

Thread 8 (Thread 0x47a31940 (LWP 24923)):
#0  0x000000374b80d73b in read () from /lib64/libpthread.so.0
#1  0x00002b2a079ef166 in read_nbytes (bsock=0x18017e38, ptr=0x47a30d84 
"k",
     nbytes=4) at bnet.c:80
#2  0x00002b2a079f2bdf in BSOCK::recv (this=0x18017e38) at bsock.c:451
#3  0x000000000041c251 in bget_dirmsg (bs=0x18017e38) at getmsg.c:137
#4  0x000000000042766c in msg_thread (arg=0x18025e18) at msgchan.c:388
#5  0x000000374b806617 in start_thread () from /lib64/libpthread.so.0
#6  0x000000374b0d3c2d in clone () from /lib64/libc.so.6

Thread 7 (Thread 0x4342a940 (LWP 25054)):
#0  0x000000374b80d73b in read () from /lib64/libpthread.so.0
#1  0x00002b2a079ef166 in read_nbytes (bsock=0x180f8978, ptr=0x43429d84 
"p",
     nbytes=4) at bnet.c:80
#2  0x00002b2a079f2bdf in BSOCK::recv (this=0x180f8978) at bsock.c:451
#3  0x000000000041c251 in bget_dirmsg (bs=0x180f8978) at getmsg.c:137
#4  0x000000000042766c in msg_thread (arg=0x17fdb858) at msgchan.c:388
#5  0x000000374b806617 in start_thread () from /lib64/libpthread.so.0
#6  0x000000374b0d3c2d in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x42a29940 (LWP 25055)):
#0  0x000000374b80d73b in read () from /lib64/libpthread.so.0
#1  0x00002b2a079ef166 in read_nbytes (bsock=0x180295c8, ptr=0x42a28d84 
"m",
     nbytes=4) at bnet.c:80
#2  0x00002b2a079f2bdf in BSOCK::recv (this=0x180295c8) at bsock.c:451
#3  0x000000000041c251 in bget_dirmsg (bs=0x180295c8) at getmsg.c:137
#4  0x000000000042766c in msg_thread (arg=0x1817a788) at msgchan.c:388
#5  0x000000374b806617 in start_thread () from /lib64/libpthread.so.0
#6  0x000000374b0d3c2d in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x43e2b940 (LWP 25056)):
#0  0x000000374b80d73b in read () from /lib64/libpthread.so.0
#1  0x00002b2a079ef166 in read_nbytes (bsock=0x180a48f8, ptr=0x43e2ad84 
"t",
     nbytes=4) at bnet.c:80
#2  0x00002b2a079f2bdf in BSOCK::recv (this=0x180a48f8) at bsock.c:451
#3  0x000000000041c251 in bget_dirmsg (bs=0x180a48f8) at getmsg.c:137
#4  0x000000000042766c in msg_thread (arg=0x17ff30f8) at msgchan.c:388
#5  0x000000374b806617 in start_thread () from /lib64/libpthread.so.0
#6  0x000000374b0d3c2d in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x45c2e940 (LWP 25058)):
#0  0x000000374b80d73b in read () from /lib64/libpthread.so.0
#1  0x00002b2a079ef166 in read_nbytes (bsock=0x1804bba8, ptr=0x45c2dd84 
"k",
     nbytes=4) at bnet.c:80
#2  0x00002b2a079f2bdf in BSOCK::recv (this=0x1804bba8) at bsock.c:451
#3  0x000000000041c251 in bget_dirmsg (bs=0x1804bba8) at getmsg.c:137
#4  0x000000000042766c in msg_thread (arg=0x18177f68) at msgchan.c:388
#5  0x000000374b806617 in start_thread () from /lib64/libpthread.so.0
#6  0x000000374b0d3c2d in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x4662f940 (LWP 25059)):
#0  0x000000374b80d73b in read () from /lib64/libpthread.so.0
#1  0x00002b2a079ef166 in read_nbytes (bsock=0x181636d8, ptr=0x4662ed84 
"f",
     nbytes=4) at bnet.c:80
#2  0x00002b2a079f2bdf in BSOCK::recv (this=0x181636d8) at bsock.c:451
#3  0x000000000041c251 in bget_dirmsg (bs=0x181636d8) at getmsg.c:137
#4  0x000000000042766c in msg_thread (arg=0x1800c828) at msgchan.c:388
#5  0x000000374b806617 in start_thread () from /lib64/libpthread.so.0
#6  0x000000374b0d3c2d in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x4e83c940 (LWP 25060)):
#0  0x000000374b80d73b in read () from /lib64/libpthread.so.0
#1  0x00002b2a079ef166 in read_nbytes (bsock=0x18136928, ptr=0x4e83bd84 
"i",
     nbytes=4) at bnet.c:80
#2  0x00002b2a079f2bdf in BSOCK::recv (this=0x18136928) at bsock.c:451
#3  0x000000000041c251 in bget_dirmsg (bs=0x18136928) at getmsg.c:137
#4  0x000000000042766c in msg_thread (arg=0x181725c8) at msgchan.c:388
#5  0x000000374b806617 in start_thread () from /lib64/libpthread.so.0
#6  0x000000374b0d3c2d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x2b2a07e49280 (LWP 17166)):
#0  0x000000374b80dfe1 in nanosleep () from /lib64/libpthread.so.0
#1  0x00002b2a079edb4b in bmicrosleep (sec=60, usec=0) at bsys.c:61
#2  0x000000000042cca3 in wait_for_next_job (
     one_shot_job_to_run=<value optimized out>) at scheduler.c:178
#3  0x000000000040d64c in main (argc=0, argv=0x7fff37f5c3b8) at dird.c:338
#0  0x000000374b80dfe1 in nanosleep () from /lib64/libpthread.so.0
No symbol table info available.
#1  0x00002b2a079edb4b in bmicrosleep (sec=60, usec=0) at bsys.c:61
61         stat = nanosleep(&timeout, NULL);
Current language:  auto; currently c++
timeout = {tv_sec = 60, tv_nsec = 0}
tv = {tv_sec = 4559264, tv_usec = 16}
tz = {tz_minuteswest = 6, tz_dsttime = 0}
stat = <value optimized out>
#2  0x000000000042cca3 in wait_for_next_job (
     one_shot_job_to_run=<value optimized out>) at scheduler.c:178
178           bmicrosleep((next_check_secs < twait)?next_check_secs:twait, 0);
twait = -516
jcr = (JCR *) 0x0
job = (JOB *) 0x0
run = (RUN *) 0x10
now = <value optimized out>
next_job = (job_item *) 0x18173058
first = false
#3  0x000000000040d64c in main (argc=0, argv=0x7fff37f5c3b8) at dird.c:338
338        while ( (jcr = wait_for_next_job(runjob)) ) {
ch = <value optimized out>
jcr = (JCR *) 0x17ffedb8
no_signals = false
test_config = false
uid = 0x0
gid = 0x0
mode = <value optimized out>
#0  0x0000000000000000 in ?? ()
No symbol table info available.
#0  0x0000000000000000 in ?? ()
No symbol table info available.
#0  0x0000000000000000 in ?? ()
No symbol table info available.
#0  0x0000000000000000 in ?? ()
No symbol table info available.


thanks for any help!
Stephen

-------------------------------------------------------------------------------------
Further info:

My catalog...

      mysql-5.0.77 (64bit) MyISAM
      210Gb in size
      1,412,297,215 records in File table
      note: database built with bacula 2x scripts,
      upgraded with 3x scripts, then again with 5x scripts
      (i.e. nothing customized along the way)

My OS & hardware for bacula DIR+SD server...

      Centos 5.4 (fully patched)
      8Gb RAM
      2Gb Swap
      1Tb EXT3 filesystem on external fiber RAID5 array
      (dedicated to database, incl. temp files)
      2 dual-core [AMD Opteron(tm) Processor 2220] CPUs
      StorageTek SL500 Library with 2 LTO3 Drives



-- 
Stephen Thompson               Berkeley Seismological Laboratory
[email protected]    215 McCone Hall # 4760
404.538.7077 (phone)           University of California, Berkeley
510.643.5811 (fax)             Berkeley, CA 94720-4760

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Bacula-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Reply via email to