Hello,
Yes, I see what is going wrong. Please submit a bug report. See
www.bacula.org -> Bug Reports and I will fix it.
Best regards,
Kern
On Monday 19 April 2010 18:29:14 Stephen Thompson wrote:
> Hello,
>
> We've definitely been having less stability with 5.0.1 than with 3.0.3
> in our environment. I posted last week about two separate but similar
> StorageDaemon crashes. This weekend we had the Director crash; luckily
> I got tracebacks working in anticipation of another SD crash. Looks
> like the Director crashed while running some of our regular nightly
> fulls...
>
> Does this traceback reveal anything?
>
> [Thread debugging using libthread_db enabled]
> [New Thread 0x2b2a07e49280 (LWP 17166)]
> [New Thread 0x4e83c940 (LWP 25060)]
> [New Thread 0x4662f940 (LWP 25059)]
> [New Thread 0x45c2e940 (LWP 25058)]
> [New Thread 0x43e2b940 (LWP 25056)]
> [New Thread 0x42a29940 (LWP 25055)]
> [New Thread 0x4342a940 (LWP 25054)]
> [New Thread 0x47a31940 (LWP 24923)]
> [New Thread 0x4de3b940 (LWP 18781)]
> [New Thread 0x4d43a940 (LWP 18780)]
> [New Thread 0x4ca39940 (LWP 18779)]
> [New Thread 0x4c038940 (LWP 18778)]
> [New Thread 0x4b637940 (LWP 18777)]
> [New Thread 0x4ac36940 (LWP 18776)]
> [New Thread 0x4a235940 (LWP 18775)]
> [New Thread 0x49834940 (LWP 18774)]
> [New Thread 0x48e33940 (LWP 18773)]
> [New Thread 0x47030940 (LWP 5301)]
> [New Thread 0x48432940 (LWP 22437)]
> [New Thread 0x4482c940 (LWP 30205)]
> [New Thread 0x42028940 (LWP 17186)]
> [New Thread 0x41627940 (LWP 17185)]
> 0x000000374b80dfe1 in nanosleep () from /lib64/libpthread.so.0
> $1 = "lawson-dir", '\0' <repeats 19 times>
> $2 = 0x17ed7528 "bacula-dir"
> $3 = 0x17ed7568 "/opt/bacula/bin/bacula-dir"
> $4 = 0x17fd3148 "MySQL"
> $5 = 0x2b2a07a1c4fe "5.0.1 (24 February 2010)"
> $6 = 0x2b2a07a1c517 "x86_64-unknown-linux-gnu"
> $7 = 0x2b2a07a1c530 "redhat"
> $8 = 0x2b2a07a1c1f5 ""
> $9 = "lawson.geo.berkeley.edu", '\0' <repeats 26 times>
> #0 0x000000374b80dfe1 in nanosleep () from /lib64/libpthread.so.0
> #1 0x00002b2a079edb4b in bmicrosleep (sec=60, usec=0) at bsys.c:61
> #2 0x000000000042cca3 in wait_for_next_job (
> one_shot_job_to_run=<value optimized out>) at scheduler.c:178
> #3 0x000000000040d64c in main (argc=0, argv=0x7fff37f5c3b8) at dird.c:338
>
> Thread 22 (Thread 0x41627940 (LWP 17185)):
> #0 0x000000374b0cced2 in select () from /lib64/libc.so.6
> #1 0x00002b2a079eff97 in bnet_thread_server (addrs=0x17ed93a8,
> max_clients=20, client_wq=0x66f880,
> handle_client_request=0x444a50 <handle_UA_client_request>)
> at bnet_server.c:161
> #2 0x0000000000444a4c in connect_thread (arg=0x17ed93a8) at ua_server.c:82
> #3 0x000000374b806617 in start_thread () from /lib64/libpthread.so.0
> #4 0x000000374b0d3c2d in clone () from /lib64/libc.so.6
>
> Thread 21 (Thread 0x42028940 (LWP 17186)):
> #0 0x000000374b80af70 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
> from /lib64/libpthread.so.0
> #1 0x00002b2a07a1409d in watchdog_thread (arg=<value optimized out>)
> at watchdog.c:308
> #2 0x000000374b806617 in start_thread () from /lib64/libpthread.so.0
> #3 0x000000374b0d3c2d in clone () from /lib64/libc.so.6
>
> Thread 20 (Thread 0x4482c940 (LWP 30205)):
> #0 0x000000374b80d73b in read () from /lib64/libpthread.so.0
> #1 0x00002b2a079ef166 in read_nbytes (bsock=0x180077a8, ptr=0x4482c014 "",
> nbytes=4) at bnet.c:80
> #2 0x00002b2a079f2bdf in BSOCK::recv (this=0x180077a8) at bsock.c:451
> #3 0x0000000000444ad7 in handle_UA_client_request (arg=<value optimized
> out>)
> at ua_server.c:139
> #4 0x00002b2a07a148de in workq_server (arg=<value optimized out>)
> at workq.c:346
> #5 0x000000374b806617 in start_thread () from /lib64/libpthread.so.0
> #6 0x000000374b0d3c2d in clone () from /lib64/libc.so.6
>
> Thread 19 (Thread 0x48432940 (LWP 22437)):
> #0 0x000000374b80d73b in read () from /lib64/libpthread.so.0
> #1 0x00002b2a079ef166 in read_nbytes (bsock=0x17fd8088, ptr=0x48432014 "",
> nbytes=4) at bnet.c:80
> #2 0x00002b2a079f2bdf in BSOCK::recv (this=0x17fd8088) at bsock.c:451
> #3 0x0000000000444ad7 in handle_UA_client_request (arg=<value optimized
> out>)
> at ua_server.c:139
> #4 0x00002b2a07a148de in workq_server (arg=<value optimized out>)
> at workq.c:346
> #5 0x000000374b806617 in start_thread () from /lib64/libpthread.so.0
> #6 0x000000374b0d3c2d in clone () from /lib64/libc.so.6
>
> Thread 18 (Thread 0x47030940 (LWP 5301)):
> #0 0x000000374b80dfe1 in nanosleep () from /lib64/libpthread.so.0
> #1 0x00002b2a079edb4b in bmicrosleep (sec=2, usec=0) at bsys.c:61
> #2 0x0000000000422f69 in jobq_server (arg=<value optimized out>) at
> jobq.c:595
> #3 0x000000374b806617 in start_thread () from /lib64/libpthread.so.0
> #4 0x000000374b0d3c2d in clone () from /lib64/libc.so.6
>
> Thread 17 (Thread 0x48e33940 (LWP 18773)):
> #0 0x000000374b80d73b in read () from /lib64/libpthread.so.0
> #1 0x00002b2a079ef166 in read_nbytes (bsock=0x180cf188, ptr=0x48e32c84 "",
> nbytes=4) at bnet.c:80
> #2 0x00002b2a079f2bdf in BSOCK::recv (this=0x180cf188) at bsock.c:451
> #3 0x000000000041c251 in bget_dirmsg (bs=0x180cf188) at getmsg.c:137
> #4 0x000000000040faa4 in wait_for_job_termination (jcr=0x17fdb858,
> timeout=0)
> at backup.c:508
> #5 0x000000000041145f in do_backup (jcr=0x17fdb858) at backup.c:456
> #6 0x000000000042143e in job_thread (arg=<value optimized out>) at
> job.c:314
> #7 0x0000000000422774 in jobq_server (arg=<value optimized out>) at
> jobq.c:450
> #8 0x000000374b806617 in start_thread () from /lib64/libpthread.so.0
> #9 0x000000374b0d3c2d in clone () from /lib64/libc.so.6
>
> Thread 16 (Thread 0x49834940 (LWP 18774)):
> #0 0x000000374b80d73b in read () from /lib64/libpthread.so.0
> #1 0x00002b2a079ef166 in read_nbytes (bsock=0x180cf2b8, ptr=0x49833c84 "",
> nbytes=4) at bnet.c:80
> #2 0x00002b2a079f2bdf in BSOCK::recv (this=0x180cf2b8) at bsock.c:451
> #3 0x000000000041c251 in bget_dirmsg (bs=0x180cf2b8) at getmsg.c:137
> #4 0x000000000040faa4 in wait_for_job_termination (jcr=0x1817a788,
> timeout=0)
> at backup.c:508
> #5 0x000000000041145f in do_backup (jcr=0x1817a788) at backup.c:456
> #6 0x000000000042143e in job_thread (arg=<value optimized out>) at
> job.c:314
> #7 0x0000000000422774 in jobq_server (arg=<value optimized out>) at
> jobq.c:450
> #8 0x000000374b806617 in start_thread () from /lib64/libpthread.so.0
> #9 0x000000374b0d3c2d in clone () from /lib64/libc.so.6
>
> Thread 15 (Thread 0x4a235940 (LWP 18775)):
> #0 0x000000374b80d73b in read () from /lib64/libpthread.so.0
> #1 0x00002b2a079ef166 in read_nbytes (bsock=0x18163808,
> ptr=0x4a234c84 "\233", nbytes=4) at bnet.c:80
> #2 0x00002b2a079f2bdf in BSOCK::recv (this=0x18163808) at bsock.c:451
> #3 0x000000000041c251 in bget_dirmsg (bs=0x18163808) at getmsg.c:137
> #4 0x000000000040faa4 in wait_for_job_termination (jcr=0x18025e18,
> timeout=0)
> at backup.c:508
> #5 0x000000000041145f in do_backup (jcr=0x18025e18) at backup.c:456
> #6 0x000000000042143e in job_thread (arg=<value optimized out>) at
> job.c:314
> #7 0x0000000000422774 in jobq_server (arg=<value optimized out>) at
> jobq.c:450
> #8 0x000000374b806617 in start_thread () from /lib64/libpthread.so.0
> #9 0x000000374b0d3c2d in clone () from /lib64/libc.so.6
>
> Thread 14 (Thread 0x4ac36940 (LWP 18776)):
> #0 0x000000374b80d73b in read () from /lib64/libpthread.so.0
> #1 0x00002b2a079ef166 in read_nbytes (bsock=0x2aaab4018398,
> ptr=0x4ac35c84 "", nbytes=4) at bnet.c:80
> #2 0x00002b2a079f2bdf in BSOCK::recv (this=0x2aaab4018398) at bsock.c:451
> #3 0x000000000041c251 in bget_dirmsg (bs=0x2aaab4018398) at getmsg.c:137
> #4 0x000000000040faa4 in wait_for_job_termination (jcr=0x181725c8,
> timeout=0)
> at backup.c:508
> #5 0x000000000041145f in do_backup (jcr=0x181725c8) at backup.c:456
> #6 0x000000000042143e in job_thread (arg=<value optimized out>) at
> job.c:314
> #7 0x0000000000422774 in jobq_server (arg=<value optimized out>) at
> jobq.c:450
> #8 0x000000374b806617 in start_thread () from /lib64/libpthread.so.0
> #9 0x000000374b0d3c2d in clone () from /lib64/libc.so.6
>
> Thread 13 (Thread 0x4b637940 (LWP 18777)):
> #0 0x000000374b80d73b in read () from /lib64/libpthread.so.0
> #1 0x00002b2a079ef166 in read_nbytes (bsock=0x2aaab4008808,
> ptr=0x4b636c84 "", nbytes=4) at bnet.c:80
> #2 0x00002b2a079f2bdf in BSOCK::recv (this=0x2aaab4008808) at bsock.c:451
> #3 0x000000000041c251 in bget_dirmsg (bs=0x2aaab4008808) at getmsg.c:137
> #4 0x000000000040faa4 in wait_for_job_termination (jcr=0x17ff30f8,
> timeout=0)
> at backup.c:508
> #5 0x000000000041145f in do_backup (jcr=0x17ff30f8) at backup.c:456
> #6 0x000000000042143e in job_thread (arg=<value optimized out>) at
> job.c:314
> #7 0x0000000000422774 in jobq_server (arg=<value optimized out>) at
> jobq.c:450
> #8 0x000000374b806617 in start_thread () from /lib64/libpthread.so.0
> #9 0x000000374b0d3c2d in clone () from /lib64/libc.so.6
>
> Thread 12 (Thread 0x4c038940 (LWP 18778)):
> #0 0x000000374b80e4af in waitpid () from /lib64/libpthread.so.0
> #1 0x00002b2a07a0b5a6 in signal_handler (sig=11) at signal.c:229
> #2 <signal handler called>
> #3 0x000000374b80baf0 in pthread_kill () from /lib64/libpthread.so.0
> #4 0x000000000041fda0 in cancel_storage_daemon_job (jcr=0x17fd8b28)
> at job.c:515
> #5 0x000000000040fc30 in wait_for_job_termination (jcr=0x17fd8b28,
> timeout=180) at backup.c:538
> #6 0x000000000041124c in do_backup (jcr=0x17fd8b28) at backup.c:476
> #7 0x000000000042143e in job_thread (arg=<value optimized out>) at
> job.c:314
> #8 0x0000000000422774 in jobq_server (arg=<value optimized out>) at
> jobq.c:450
> #9 0x000000374b806617 in start_thread () from /lib64/libpthread.so.0
> #10 0x000000374b0d3c2d in clone () from /lib64/libc.so.6
>
> Thread 11 (Thread 0x4ca39940 (LWP 18779)):
> #0 0x000000374b80d73b in read () from /lib64/libpthread.so.0
> #1 0x00002b2a079ef166 in read_nbytes (bsock=0x180c7b48, ptr=0x4ca38c84 "",
> nbytes=4) at bnet.c:80
> #2 0x00002b2a079f2bdf in BSOCK::recv (this=0x180c7b48) at bsock.c:451
> #3 0x000000000041c251 in bget_dirmsg (bs=0x180c7b48) at getmsg.c:137
> #4 0x000000000040faa4 in wait_for_job_termination (jcr=0x1800c828,
> timeout=0)
> at backup.c:508
> #5 0x000000000041145f in do_backup (jcr=0x1800c828) at backup.c:456
> #6 0x000000000042143e in job_thread (arg=<value optimized out>) at
> job.c:314
> #7 0x0000000000422774 in jobq_server (arg=<value optimized out>) at
> jobq.c:450
> #8 0x000000374b806617 in start_thread () from /lib64/libpthread.so.0
> #9 0x000000374b0d3c2d in clone () from /lib64/libc.so.6
>
> Thread 10 (Thread 0x4d43a940 (LWP 18780)):
> #0 0x000000374b80dfe1 in nanosleep () from /lib64/libpthread.so.0
> #1 0x00002b2a079edb4b in bmicrosleep (sec=2, usec=0) at bsys.c:61
> #2 0x0000000000422f69 in jobq_server (arg=<value optimized out>) at
> jobq.c:595
> #3 0x000000374b806617 in start_thread () from /lib64/libpthread.so.0
> #4 0x000000374b0d3c2d in clone () from /lib64/libc.so.6
>
> Thread 9 (Thread 0x4de3b940 (LWP 18781)):
> #0 0x000000374b80d73b in read () from /lib64/libpthread.so.0
> #1 0x00002b2a079ef166 in read_nbytes (bsock=0x1806ab68, ptr=0x4de3ac84
> "¿",
> nbytes=4) at bnet.c:80
> #2 0x00002b2a079f2bdf in BSOCK::recv (this=0x1806ab68) at bsock.c:451
> #3 0x000000000041c251 in bget_dirmsg (bs=0x1806ab68) at getmsg.c:137
> #4 0x000000000040faa4 in wait_for_job_termination (jcr=0x18177f68,
> timeout=0)
> at backup.c:508
> #5 0x000000000041145f in do_backup (jcr=0x18177f68) at backup.c:456
> #6 0x000000000042143e in job_thread (arg=<value optimized out>) at
> job.c:314
> #7 0x0000000000422774 in jobq_server (arg=<value optimized out>) at
> jobq.c:450
> #8 0x000000374b806617 in start_thread () from /lib64/libpthread.so.0
> #9 0x000000374b0d3c2d in clone () from /lib64/libc.so.6
>
> Thread 8 (Thread 0x47a31940 (LWP 24923)):
> #0 0x000000374b80d73b in read () from /lib64/libpthread.so.0
> #1 0x00002b2a079ef166 in read_nbytes (bsock=0x18017e38, ptr=0x47a30d84
> "k",
> nbytes=4) at bnet.c:80
> #2 0x00002b2a079f2bdf in BSOCK::recv (this=0x18017e38) at bsock.c:451
> #3 0x000000000041c251 in bget_dirmsg (bs=0x18017e38) at getmsg.c:137
> #4 0x000000000042766c in msg_thread (arg=0x18025e18) at msgchan.c:388
> #5 0x000000374b806617 in start_thread () from /lib64/libpthread.so.0
> #6 0x000000374b0d3c2d in clone () from /lib64/libc.so.6
>
> Thread 7 (Thread 0x4342a940 (LWP 25054)):
> #0 0x000000374b80d73b in read () from /lib64/libpthread.so.0
> #1 0x00002b2a079ef166 in read_nbytes (bsock=0x180f8978, ptr=0x43429d84
> "p",
> nbytes=4) at bnet.c:80
> #2 0x00002b2a079f2bdf in BSOCK::recv (this=0x180f8978) at bsock.c:451
> #3 0x000000000041c251 in bget_dirmsg (bs=0x180f8978) at getmsg.c:137
> #4 0x000000000042766c in msg_thread (arg=0x17fdb858) at msgchan.c:388
> #5 0x000000374b806617 in start_thread () from /lib64/libpthread.so.0
> #6 0x000000374b0d3c2d in clone () from /lib64/libc.so.6
>
> Thread 6 (Thread 0x42a29940 (LWP 25055)):
> #0 0x000000374b80d73b in read () from /lib64/libpthread.so.0
> #1 0x00002b2a079ef166 in read_nbytes (bsock=0x180295c8, ptr=0x42a28d84
> "m",
> nbytes=4) at bnet.c:80
> #2 0x00002b2a079f2bdf in BSOCK::recv (this=0x180295c8) at bsock.c:451
> #3 0x000000000041c251 in bget_dirmsg (bs=0x180295c8) at getmsg.c:137
> #4 0x000000000042766c in msg_thread (arg=0x1817a788) at msgchan.c:388
> #5 0x000000374b806617 in start_thread () from /lib64/libpthread.so.0
> #6 0x000000374b0d3c2d in clone () from /lib64/libc.so.6
>
> Thread 5 (Thread 0x43e2b940 (LWP 25056)):
> #0 0x000000374b80d73b in read () from /lib64/libpthread.so.0
> #1 0x00002b2a079ef166 in read_nbytes (bsock=0x180a48f8, ptr=0x43e2ad84
> "t",
> nbytes=4) at bnet.c:80
> #2 0x00002b2a079f2bdf in BSOCK::recv (this=0x180a48f8) at bsock.c:451
> #3 0x000000000041c251 in bget_dirmsg (bs=0x180a48f8) at getmsg.c:137
> #4 0x000000000042766c in msg_thread (arg=0x17ff30f8) at msgchan.c:388
> #5 0x000000374b806617 in start_thread () from /lib64/libpthread.so.0
> #6 0x000000374b0d3c2d in clone () from /lib64/libc.so.6
>
> Thread 4 (Thread 0x45c2e940 (LWP 25058)):
> #0 0x000000374b80d73b in read () from /lib64/libpthread.so.0
> #1 0x00002b2a079ef166 in read_nbytes (bsock=0x1804bba8, ptr=0x45c2dd84
> "k",
> nbytes=4) at bnet.c:80
> #2 0x00002b2a079f2bdf in BSOCK::recv (this=0x1804bba8) at bsock.c:451
> #3 0x000000000041c251 in bget_dirmsg (bs=0x1804bba8) at getmsg.c:137
> #4 0x000000000042766c in msg_thread (arg=0x18177f68) at msgchan.c:388
> #5 0x000000374b806617 in start_thread () from /lib64/libpthread.so.0
> #6 0x000000374b0d3c2d in clone () from /lib64/libc.so.6
>
> Thread 3 (Thread 0x4662f940 (LWP 25059)):
> #0 0x000000374b80d73b in read () from /lib64/libpthread.so.0
> #1 0x00002b2a079ef166 in read_nbytes (bsock=0x181636d8, ptr=0x4662ed84
> "f",
> nbytes=4) at bnet.c:80
> #2 0x00002b2a079f2bdf in BSOCK::recv (this=0x181636d8) at bsock.c:451
> #3 0x000000000041c251 in bget_dirmsg (bs=0x181636d8) at getmsg.c:137
> #4 0x000000000042766c in msg_thread (arg=0x1800c828) at msgchan.c:388
> #5 0x000000374b806617 in start_thread () from /lib64/libpthread.so.0
> #6 0x000000374b0d3c2d in clone () from /lib64/libc.so.6
>
> Thread 2 (Thread 0x4e83c940 (LWP 25060)):
> #0 0x000000374b80d73b in read () from /lib64/libpthread.so.0
> #1 0x00002b2a079ef166 in read_nbytes (bsock=0x18136928, ptr=0x4e83bd84
> "i",
> nbytes=4) at bnet.c:80
> #2 0x00002b2a079f2bdf in BSOCK::recv (this=0x18136928) at bsock.c:451
> #3 0x000000000041c251 in bget_dirmsg (bs=0x18136928) at getmsg.c:137
> #4 0x000000000042766c in msg_thread (arg=0x181725c8) at msgchan.c:388
> #5 0x000000374b806617 in start_thread () from /lib64/libpthread.so.0
> #6 0x000000374b0d3c2d in clone () from /lib64/libc.so.6
>
> Thread 1 (Thread 0x2b2a07e49280 (LWP 17166)):
> #0 0x000000374b80dfe1 in nanosleep () from /lib64/libpthread.so.0
> #1 0x00002b2a079edb4b in bmicrosleep (sec=60, usec=0) at bsys.c:61
> #2 0x000000000042cca3 in wait_for_next_job (
> one_shot_job_to_run=<value optimized out>) at scheduler.c:178
> #3 0x000000000040d64c in main (argc=0, argv=0x7fff37f5c3b8) at dird.c:338
> #0 0x000000374b80dfe1 in nanosleep () from /lib64/libpthread.so.0
> No symbol table info available.
> #1 0x00002b2a079edb4b in bmicrosleep (sec=60, usec=0) at bsys.c:61
> 61 stat = nanosleep(&timeout, NULL);
> Current language: auto; currently c++
> timeout = {tv_sec = 60, tv_nsec = 0}
> tv = {tv_sec = 4559264, tv_usec = 16}
> tz = {tz_minuteswest = 6, tz_dsttime = 0}
> stat = <value optimized out>
> #2 0x000000000042cca3 in wait_for_next_job (
> one_shot_job_to_run=<value optimized out>) at scheduler.c:178
> 178 bmicrosleep((next_check_secs < twait)?next_check_secs:twait, 0);
> twait = -516
> jcr = (JCR *) 0x0
> job = (JOB *) 0x0
> run = (RUN *) 0x10
> now = <value optimized out>
> next_job = (job_item *) 0x18173058
> first = false
> #3 0x000000000040d64c in main (argc=0, argv=0x7fff37f5c3b8) at dird.c:338
> 338 while ( (jcr = wait_for_next_job(runjob)) ) {
> ch = <value optimized out>
> jcr = (JCR *) 0x17ffedb8
> no_signals = false
> test_config = false
> uid = 0x0
> gid = 0x0
> mode = <value optimized out>
> #0 0x0000000000000000 in ?? ()
> No symbol table info available.
> #0 0x0000000000000000 in ?? ()
> No symbol table info available.
> #0 0x0000000000000000 in ?? ()
> No symbol table info available.
> #0 0x0000000000000000 in ?? ()
> No symbol table info available.
>
>
> thanks for any help!
> Stephen
>
> ---------------------------------------------------------------------------
>---------- Further info:
>
> My catalog...
>
> mysql-5.0.77 (64bit) MyISAM
> 210Gb in size
> 1,412,297,215 records in File table
> note: database built with bacula 2x scripts,
> upgraded with 3x scripts, then again with 5x scripts
> (i.e. nothing customized along the way)
>
> My OS & hardware for bacula DIR+SD server...
>
> Centos 5.4 (fully patched)
> 8Gb RAM
> 2Gb Swap
> 1Tb EXT3 filesystem on external fiber RAID5 array
> (dedicated to database, incl. temp files)
> 2 dual-core [AMD Opteron(tm) Processor 2220] CPUs
> StorageTek SL500 Library with 2 LTO3 Drives
------------------------------------------------------------------------------
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Bacula-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-devel