(gdb) run  -s -f -c ../etc/bacula-dir.conf
Starting program: /usr/local/bacula/sbin/bacula-dir -s -f 
-c ../etc/bacula-dir.conf
[Please pardon the top post]

OK, I compiled 1.36.3, and ran the director under gdb.  After some normal 
execution, including some volume purging, I tried to start a bunch of jobs 
like so:

for ii in BackupCatalog fmpserver distance locus community elive admin1 
communications1 communications2 communications3 grades1 libro otter records1 
registrar1 ruth sheri1 textbook1 textbook3 textbook4 curt fiscalpro bob5 
idbigblue odonnell webmaster webdev1
do echo -e "run $ii\nmod\n1\n2\nyes\nq\n"| ./bconsole
done

It went on for a few OK, and then it died with the message shown below.  BTW, 
I was able to kill it like this twice.

When bacula-dir crashed, it also left a few rows in the db with a client ID of 
0.

I got this from running bacula-dir inside gdb.  (output from the thread dump 
below).

[Thread debugging using libthread_db enabled]
[New Thread 16384 (LWP 19065)]
[New Thread 32769 (LWP 19067)]
[New Thread 16386 (LWP 19068)]
[New Thread 32771 (LWP 19069)]
[New Thread 49156 (LWP 19072)]
[Thread 49156 (LWP 19072) exited]
[New Thread 65540 (LWP 19216)]
[New Thread 81925 (LWP 19219)]
[New Thread 98310 (LWP 19221)]
[Thread 65540 (LWP 19216) exited]
[New Thread 114692 (LWP 19234)]
[Thread 114692 (LWP 19234) exited]
[New Thread 131076 (LWP 19295)]
herodotus-dir: dird.c:438 Director's configuration file reread.
[Thread 131076 (LWP 19295) exited]
[New Thread 147460 (LWP 19300)]
[New Thread 163847 (LWP 19302)]
[New Thread 180232 (LWP 19305)]
[Thread 147460 (LWP 19300) exited]
[New Thread 196612 (LWP 19308)]
[New Thread 213001 (LWP 19312)]
[Thread 180232 (LWP 19305) exited]
[New Thread 229384 (LWP 19314)]
[New Thread 245770 (LWP 19319)]
Detaching after fork from child process 19321.
[New Thread 262155 (LWP 19323)]
[Thread 213001 (LWP 19312) exited]
[New Thread 278537 (LWP 19325)]
[New Thread 294924 (LWP 19330)]
[New Thread 311309 (LWP 19333)]
[Thread 245770 (LWP 19319) exited]
Cannot find thread 245770: invalid thread handle
(gdb) 

Here is the output from "thread apply all bt"

(gdb) thread apply all bt

Thread 30 (Thread 458763 (LWP 19883)):
#0  0xb7e87db6 in nanosleep () from /lib/i686/libpthread.so.0
#1  0x00000000 in ?? ()
#2  0x0807de3c in bmicrosleep (sec=0, usec=-1251095380) at bsys.c:59
#3  0x08057125 in create_unique_job_name (jcr=0x80f5d20, base_name=0x80a59b5 
"*Console*") at job.c:658
#4  0x08070ae0 in new_control_jcr (base_name=0x80a59b5 "*Console*", 
job_type=-516) at ua_server.c:101
#5  0x08070c97 in handle_UA_client_request (arg=0x80d7518) at ua_server.c:122
#6  0x08098426 in workq_server (arg=0x80bdda0) at workq.c:347
#7  0xb7e81421 in pthread_start_thread () from /lib/i686/libpthread.so.0
#8  0xb7e81591 in pthread_start_thread_event () from /lib/i686/libpthread.so.0
#9  0xb7d1e36a in clone () from /lib/i686/libc.so.6

Thread 29 (Thread 442376 (LWP 19878)):
#0  0xb7e87db6 in nanosleep () from /lib/i686/libpthread.so.0
#1  0x00000000 in ?? ()
#2  0xb7e84188 in __pthread_timedsuspend_new () from /lib/i686/libpthread.so.0
#3  0xb7e803e9 in pthread_cond_timedwait_relative () 
from /lib/i686/libpthread.so.0
#4  0x080982e7 in workq_server (arg=0x80bdda0) at workq.c:322
#5  0xb7e81421 in pthread_start_thread () from /lib/i686/libpthread.so.0
#6  0xb7e81591 in pthread_start_thread_event () from /lib/i686/libpthread.so.0
#7  0xb7d1e36a in clone () from /lib/i686/libc.so.6

Thread 16 (Thread 229383 (LWP 19815)):
#0  0xb7e87db6 in nanosleep () from /lib/i686/libpthread.so.0
#1  0x00000000 in ?? ()
#2  0x0807de3c in bmicrosleep (sec=2, usec=-1236410932) at bsys.c:59
#3  0x0805955a in jobq_server (arg=0x80bdc20) at jobq.c:674
#4  0xb7e81421 in pthread_start_thread () from /lib/i686/libpthread.so.0
#5  0xb7e81591 in pthread_start_thread_event () from /lib/i686/libpthread.so.0
#6  0xb7d1e36a in clone () from /lib/i686/libc.so.6

Thread 14 (Thread 196618 (LWP 19807)):
#0  0xb7e87db6 in nanosleep () from /lib/i686/libpthread.so.0
#1  0x00000000 in ?? ()
#2  0x0807de3c in bmicrosleep (sec=2, usec=-1248997940) at bsys.c:59
#3  0x0805955a in jobq_server (arg=0x80bdc20) at jobq.c:674
#4  0xb7e81421 in pthread_start_thread () from /lib/i686/libpthread.so.0
#5  0xb7e81591 in pthread_start_thread_event () from /lib/i686/libpthread.so.0
#6  0xb7d1e36a in clone () from /lib/i686/libc.so.6
---Type <return> to continue, or q <return> to quit---

Thread 11 (Thread 147462 (LWP 19798)):
#0  0xb7e87db6 in nanosleep () from /lib/i686/libpthread.so.0
#1  0x00000000 in ?? ()
#2  0x0807de3c in bmicrosleep (sec=2, usec=-1234309684) at bsys.c:59
#3  0x0805955a in jobq_server (arg=0x80bdc20) at jobq.c:674
#4  0xb7e81421 in pthread_start_thread () from /lib/i686/libpthread.so.0
#5  0xb7e81591 in pthread_start_thread_event () from /lib/i686/libpthread.so.0
#6  0xb7d1e36a in clone () from /lib/i686/libc.so.6

Thread 9 (Thread 114692 (LWP 19778)):
#0  0xb7e8289b in __pthread_fork () from /lib/i686/libpthread.so.0
#1  0xb7ce81a8 in fork () from /lib/i686/libc.so.6
#2  0xb7e82954 in fork () from /lib/i686/libpthread.so.0
#3  0x080818b1 in open_bpipe (prog=0x80ccdf8 
"/usr/local/bacula/scripts/delete_catalog_backup", wait=0, 
    mode=0x1 <Address 0x1 out of bounds>) at bpipe.c:90
#4  0x08056c99 in job_thread (arg=0x80de298) at job.c:262
#5  0x080592bb in jobq_server (arg=0x80bdc20) at jobq.c:444
#6  0xb7e81421 in pthread_start_thread () from /lib/i686/libpthread.so.0
#7  0xb7e81591 in pthread_start_thread_event () from /lib/i686/libpthread.so.0
#8  0xb7d1e36a in clone () from /lib/i686/libc.so.6

Thread 7 (Thread 81925 (LWP 19772)):
#0  0xb7e87db6 in nanosleep () from /lib/i686/libpthread.so.0
#1  0x00000000 in ?? ()
#2  0x0807de3c in bmicrosleep (sec=2, usec=-1232212532) at bsys.c:59
#3  0x0805955a in jobq_server (arg=0x80bdc20) at jobq.c:674
#4  0xb7e81421 in pthread_start_thread () from /lib/i686/libpthread.so.0
#5  0xb7e81591 in pthread_start_thread_event () from /lib/i686/libpthread.so.0
#6  0xb7d1e36a in clone () from /lib/i686/libc.so.6

Thread 4 (Thread 32771 (LWP 19615)):
#0  0xb7e87db6 in nanosleep () from /lib/i686/libpthread.so.0
#1  0x00000001 in ?? ()
#2  0xb7e84188 in __pthread_timedsuspend_new () from /lib/i686/libpthread.so.0
#3  0xb7e803e9 in pthread_cond_timedwait_relative () 
from /lib/i686/libpthread.so.0
#4  0x0809777a in watchdog_thread (arg=0x0) at watchdog.c:289
#5  0xb7e81421 in pthread_start_thread () from /lib/i686/libpthread.so.0
#6  0xb7e81591 in pthread_start_thread_event () from /lib/i686/libpthread.so.0
#7  0xb7d1e36a in clone () from /lib/i686/libc.so.6
---Type <return> to continue, or q <return> to quit---

Thread 3 (Thread 16386 (LWP 19614)):
#0  0xb7d174a1 in select () from /lib/i686/libc.so.6
#1  0x0000000b in ?? ()
#2  0x080d66bc in ?? ()
#3  0xb7adf234 in ?? ()
#4  0x00000000 in ?? ()
#5  0x08081167 in bnet_thread_server (addrs=0x0, max_clients=10, 
client_wq=0x80bdda0, 
    handle_client_request=0x8070c70 <handle_UA_client_request>) at 
bnet_server.c:154
#6  0x08070a58 in connect_thread (arg=0x80bff38) at ua_server.c:79
#7  0xb7e81421 in pthread_start_thread () from /lib/i686/libpthread.so.0
#8  0xb7e81591 in pthread_start_thread_event () from /lib/i686/libpthread.so.0
#9  0xb7d1e36a in clone () from /lib/i686/libc.so.6

Thread 2 (Thread 32769 (LWP 19613)):
#0  0xb7d1529a in poll () from /lib/i686/libc.so.6
#1  0xb7e80f00 in __pthread_manager () from /lib/i686/libpthread.so.0
#2  0xb7e811d5 in __pthread_manager_event () from /lib/i686/libpthread.so.0
#3  0xb7d1e36a in clone () from /lib/i686/libc.so.6

Thread 1 (Thread 16384 (LWP 19609)):
#0  0xb7e87db6 in nanosleep () from /lib/i686/libpthread.so.0
#1  0x00000000 in ?? ()
#2  0x0807de3c in bmicrosleep (sec=60, usec=-1073744136) at bsys.c:59
#3  0x0805eac0 in wait_for_next_job (one_shot_job_to_run=0x80cbac0 "") at 
scheduler.c:101
#4  0x0804c057 in main (argc=135003960, argv=0x80c0010) at dird.c:244
Segmentation fault


On Thursday 06 April 2006 17:10, Joshua Kugler wrote:
> [Disclaimer: I've searched the archives best I know how.  If you can point
> me to docs and/or messages I missed, that'd be great!]
>
> We've been using Bacula for over a year, and it has run great.  Recently,
> we got a nice disk-based 5.1TB array (Coraid AoE if you care) are working
> on implementing it with Bacula.  All the configuration has gone great, and
> we're going test runs.
>
> This is where we run into problems.
>
> If I fire off Full backups of all the clients, it will run OK for a while.
> Then at one point, I tried a command on bconsole, and it said
>
> 06-Apr 15:25 bconsole:  Error: bnet.c:403 Write error sending to Director
> daemon:herodotus.cde.uaf.edu:9101: ERR=Broken pipe
> [EMAIL PROTECTED] /usr/local/bacula/sbin]# ./bconsole
> Connecting to Director herodotus.cde.uaf.edu:9101
> 06-Apr 15:25 bconsole:  Fatal error: bnet.c:773 Unable to connect to
> Director daemon on herodotus.cde.uaf.edu:9101.
>
> A ps -Af shows *no* bacula-dir processes left running.  Top shows bacula-sd
> still grinding away, as well as some of the SSH tunnels.  I can still get
> to the network drive and do things like ls and du, so it's not lost
> communication.  Restarting bacula and doing status from bconsole shows no
> jobs running, but the database shows a bunch of jobs in JobStatus "R".
>
> The bacula (/var/bacula/working/log) shows nothing out of the ordinary.
>
> This is on Linux, with kernel 2.6.11-12mdksmp, Bacula 1.36.1, 1GB of
> memory. There is no dump, stack trace, or e-mail about the crash.
>
> I know there are more recent versions.  I don't have time right now to
> upgrade all my clients.  Should I try 1.36.3 before I throw in the towel? 
> Any other ideas?  Am I hitting the race condition noted here:
> http://article.gmane.org/gmane.comp.sysutils.backup.bacula.general/16842
>
> j----- k-----

-- 
Joshua Kugler                 PGP Key: http://pgp.mit.edu/
CDE System Administrator             ID 0xDB26D7CE
http://distance.uaf.edu/


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to