-----BEGIN PGP SIGNED MESSAGE-----
Hello everybody, I am dealing with a segmentation fault error on one of my bacula-fd clients. It's running 5.2.13 on SPARC Solaris 10 Generic_147440-01. According to the debug output it is caused by abudhabi-rad1-sys-fd: lockmgr.c:360-0 ASSERT failed at bnet_server.c:209: current >= 0 Bacula interrupted by signal 11: Segmentation Fault I am able to reproduce this by querying the client status from bconsole a SECOND time after restarting bacula-fd. The first time it works fine but the second time it crashes. It happens even when I try to run backup jobs. The first one succeeds, the second one crashes with the same assert problem. Or if I try to query the status of the client while a backup is running, i.e. the second connection after restart. Here is the backtrace of the time when it crashes: - ----- bacula.1203.traceback ------ [New process 1203] Retry #1: Retry #2: Retry #3: Retry #4: [Thread debugging using libthread_db enabled] [New LWP 3 ] [New LWP 2 ] [New Thread 1 ] [New Thread 2 (LWP 2)] [New Thread 3 ] [Switching to Thread 1 ] 0xfebca710 in __lwp_park () from /lib/libc.so.1 $1 = '\000' <repeats 29 times> $2 = 0x47530 "bacula-fd" $3 = 0x0 $4 = 0x0 $5 = 0xff2985d0 "5.2.13 (19 February 2013)" $6 = 0xff2985a8 "sparc-sun-solaris2.10" $7 = 0xff2985a0 "solaris" $8 = 0xff298598 "5.10" $9 = "abudhabi-rad1-sys", '\000' <repeats 20 times> $10 = 0xff2985c0 "solaris 5.10" $11 = 0 Environment variable "TestName" not defined. #0 0xfebca710 in __lwp_park () from /lib/libc.so.1 #1 0xfebc2a78 in mutex_lock_queue () from /lib/libc.so.1 #2 0xfebace94 in _flockget () from /lib/libc.so.1 #3 0xfebadbf8 in fclose () from /lib/libc.so.1 Backtrace stopped: previous frame inner to this frame (corrupt stack?) Thread 6 (Thread 3 ): #0 0xfebca710 in __lwp_park () from /lib/libc.so.1 #1 0xfebc459c in cond_sleep_queue () from /lib/libc.so.1 #2 0xfebc4760 in cond_wait_queue () from /lib/libc.so.1 #3 0xfebc4ba4 in cond_wait_common () from /lib/libc.so.1 #4 0xfebc4d38 in _cond_timedwait () from /lib/libc.so.1 #5 0xfebc4e2c in cond_timedwait () from /lib/libc.so.1 #6 0xfebc4e6c in pthread_cond_timedwait () from /lib/libc.so.1 #7 0xff2919ec in bthread_cond_timedwait_p (cond=0xff2ae3d0, m=0xff2ae3b8, abstime=0xfe8fbf18, file=0xff29ab68 "watchdog.c", line=321) at lockmgr.c:824 #8 0xff28af50 in watchdog_thread (arg=<optimized out>) at watchdog.c:321 #9 0xff2912d4 in lmgr_thread_launcher (x=0x493f0) at lockmgr.c:939 #10 0xfebca678 in _lwp_start () from /lib/libc.so.1 #11 0xfebca678 in _lwp_start () from /lib/libc.so.1 Backtrace stopped: previous frame identical to this frame (corrupt stack?) Thread 5 (Thread 2 (LWP 2)): #0 0xfebce658 in _waitid () from /lib/libc.so.1 #1 0xfeb6f81c in _waitpid () from /lib/libc.so.1 #2 0xfebbe000 in waitpid () from /lib/libc.so.1 #3 0xff281dc0 in signal_handler (sig=11) at signal.c:237 #4 <signal handler called> #5 0xff2924cc in lmgr_thread_t::do_V (this=0x5dcd8, m=0xff2ae158, f=0xff293a20 "bnet_server.c", l=209) at lockmgr.c:360 #6 0xff2918b0 in bthread_mutex_unlock_p (m=0xff2ae158, file=0xff293a20 "bnet_server.c", line=209) at lockmgr.c:793 #7 0xff262bf4 in bnet_thread_server (addr_list=<optimized out>, max_clients=20, client_wq=0x47180, handle_client_request=0x242dc <handle_client_request(void*)>) at bnet_server.c:209 #8 0x0002e4ec in main (argc=<optimized out>, argv=<optimized out>) at filed.c:278 Thread 4 (Thread 1 ): #0 0xfebca710 in __lwp_park () from /lib/libc.so.1 #1 0xfebc2a78 in mutex_lock_queue () from /lib/libc.so.1 #2 0xfebace94 in _flockget () from /lib/libc.so.1 #3 0xfebadbf8 in fclose () from /lib/libc.so.1 Backtrace stopped: previous frame inner to this frame (corrupt stack?) Thread 3 (LWP 2 ): #0 0xfebce658 in _waitid () from /lib/libc.so.1 #1 0xfeb6f81c in _waitpid () from /lib/libc.so.1 #2 0xfebbe000 in waitpid () from /lib/libc.so.1 #3 0xff281dc0 in signal_handler (sig=11) at signal.c:237 #4 <signal handler called> #5 0xff2924cc in lmgr_thread_t::do_V (this=0x5dcd8, m=0xff2ae158, f=0xff293a20 "bnet_server.c", l=209) at lockmgr.c:360 #6 0xff2918b0 in bthread_mutex_unlock_p (m=0xff2ae158, file=0xff293a20 "bnet_server.c", line=209) at lockmgr.c:793 #7 0xff262bf4 in bnet_thread_server (addr_list=<optimized out>, max_clients=20, client_wq=0x47180, handle_client_request=0x242dc <handle_client_request(void*)>) at bnet_server.c:209 #8 0x0002e4ec in main (argc=<optimized out>, argv=<optimized out>) at filed.c:278 Thread 2 (LWP 3 ): #0 0xfebca710 in __lwp_park () from /lib/libc.so.1 #1 0xfebc459c in cond_sleep_queue () from /lib/libc.so.1 #2 0xfebc4760 in cond_wait_queue () from /lib/libc.so.1 #3 0xfebc4ba4 in cond_wait_common () from /lib/libc.so.1 #4 0xfebc4d38 in _cond_timedwait () from /lib/libc.so.1 #5 0xfebc4e2c in cond_timedwait () from /lib/libc.so.1 #6 0xfebc4e6c in pthread_cond_timedwait () from /lib/libc.so.1 #7 0xff2919ec in bthread_cond_timedwait_p (cond=0xff2ae3d0, m=0xff2ae3b8, abstime=0xfe8fbf18, file=0xff29ab68 "watchdog.c", line=321) at lockmgr.c:824 #8 0xff28af50 in watchdog_thread (arg=<optimized out>) at watchdog.c:321 #9 0xff2912d4 in lmgr_thread_launcher (x=0x493f0) at lockmgr.c:939 #10 0xfebca678 in _lwp_start () from /lib/libc.so.1 #11 0xfebca678 in _lwp_start () from /lib/libc.so.1 Backtrace stopped: previous frame identical to this frame (corrupt stack?) Thread 1 (LWP 1 ): #0 0xfebca710 in __lwp_park () from /lib/libc.so.1 #1 0xfebc2a78 in mutex_lock_queue () from /lib/libc.so.1 #2 0xfebace94 in _flockget () from /lib/libc.so.1 #3 0xfebadbf8 in fclose () from /lib/libc.so.1 Backtrace stopped: previous frame inner to this frame (corrupt stack?) #0 0xfebca710 in __lwp_park () from /lib/libc.so.1 No symbol table info available. #1 0xfebc2a78 in mutex_lock_queue () from /lib/libc.so.1 No symbol table info available. #2 0xfebace94 in _flockget () from /lib/libc.so.1 No symbol table info available. #3 0xfebadbf8 in fclose () from /lib/libc.so.1 No symbol table info available. #0 0x00000000 in ?? () No symbol table info available. #0 0x00000000 in ?? () No symbol table info available. #0 0x00000000 in ?? () No symbol table info available. #0 0x00000000 in ?? () No symbol table info available. - ----- SNIP ----- And here is the debug output from bacula-fd -c ../etc/bacula-fd.conf -v -f -d799 - ------ SNIP ----- # ./bacula-fd -c ../etc/bacula-fd.conf -v -f -d799 bacula-fd: lex.c:185-0 Open config file: ../etc/bacula-fd.conf bacula-fd: filed_conf.c:452-0 Inserting director res: bacula-mon bacula-fd: lex.c:185-0 Open config file: ../etc/bacula-fd.conf abudhabi-rad1-sys-fd: message.c:504-0 Close_msg jcr=0 abudhabi-rad1-sys-fd: message.c:347-0 Copy message resource 4a040 to 48300 abudhabi-rad1-sys-fd: bsys.c:556-0 Could not open state file. sfd=-1 size=192: ERR=No such file or directory abudhabi-rad1-sys-fd: fd_plugins.c:1100-0 plugin dir is NULL abudhabi-rad1-sys-fd: filed.c:276-0 filed: listening on port 9102 abudhabi-rad1-sys-fd: bnet_server.c:112-0 Addresses host[ipv4:0.0.0.0:9102] abudhabi-rad1-sys-fd: bnet.c:766-0 who=client host=128.122.128.60 port=9102 abudhabi-rad1-sys-fd: find.c:81-0 init_find_files ff=629d0 abudhabi-rad1-sys-fd: job.c:270-0 <dird: Hello Director bacula-dir calling abudhabi-rad1-sys-fd: job.c:286-0 Executing Hello command. abudhabi-rad1-sys-fd: job.c:436-0 Calling Authenticate abudhabi-rad1-sys-fd: cram-md5.c:72-0 send: auth cram-md5 <146880531.1365609328@abudhabi-rad1-sys-fd> ssl=0 abudhabi-rad1-sys-fd: cram-md5.c:131-0 cram-get received: auth cram-md5 <1124129763.1365609328@bacula-dir> ssl=0 abudhabi-rad1-sys-fd: cram-md5.c:150-0 sending resp to challenge: vSJPvl/W69/Ag6Zs83+6+C abudhabi-rad1-sys-fd: job.c:440-0 OK Authenticate abudhabi-rad1-sys-fd: job.c:270-0 <dird: JobId=0 Job=-Console-.2013-04-10_11.40.56_40 SDid=0 SDtime=0 Authorization=dummy abudhabi-rad1-sys-fd: job.c:286-0 Executing JobId= command. abudhabi-rad1-sys-fd: job.c:1737-0 set sd auth key abudhabi-rad1-sys-fd: job.c:544-0 JobId=0 Auth=dummy abudhabi-rad1-sys-fd: fd_plugins.c:1197-0 plugin list is NULL abudhabi-rad1-sys-fd: job.c:270-0 <dird: statusabudhabi-rad1-sys-fd: job.c:286-0 Executing status command. abudhabi-rad1-sys-fd: runscript.c:108-0 runscript: running all RUNSCRIPT object (ClientAfterJob) JobStatus=C abudhabi-rad1-sys-fd: job.c:399-0 Calling term_find_files abudhabi-rad1-sys-fd: job.c:404-0 Done with term_find_files abudhabi-rad1-sys-fd: runscript.c:286-0 runscript: freeing all RUNSCRIPTS object abudhabi-rad1-sys-fd: message.c:504-0 Close_msg jcr=62580 abudhabi-rad1-sys-fd: message.c:504-0 Close_msg jcr=0 abudhabi-rad1-sys-fd: job.c:406-0 Done with free_jcr abudhabi-rad1-sys-fd: mem_pool.c:375-0 garbage collect memory pool That was the first client status query. This is the debug output of the second query: abudhabi-rad1-sys-fd: lockmgr.c:360-0 ASSERT failed at bnet_server.c:209: current >= 0 Bacula interrupted by signal 11: Segmentation Fault Kaboom! bacula-fd, abudhabi-rad1-sys-fd got signal 11 - Segmentation Fault. Attempting traceback. Kaboom! exepath=/usr/local/bacula/sbin abudhabi-rad1-sys-fd: signal.c:205-0 Working=/usr/local/bacula/var abudhabi-rad1-sys-fd: signal.c:206-0 btpath=/usr/local/bacula/sbin/btraceback abudhabi-rad1-sys-fd: signal.c:207-0 exepath=/usr/local/bacula/sbin/bacula-fd abudhabi-rad1-sys-fd: signal.c:236-0 Doing waitpid Calling: /usr/local/bacula/sbin/btraceback /usr/local/bacula/sbin/bacula-fd 1203 /usr/local/bacula/var gcore: /usr/local/bacula/var/bacula-fd.1203 dumped /usr/local/bacula/sbin/btraceback: /usr/local/bacula/sbin/bsmtp: not found abudhabi-rad1-sys-fd: signalThe btraceback call returned 1 Dumping: /usr/local/bacula/var/abudhabi-rad1-sys-fd.1203.bactrace cat: write error: Broken pipe - ----- SNIP ----- I'll be happy to provide more information if needed. Thanks! - - Michael - -- Michael Hocke New York University Sr UNIX Systems Administrator Information Technology Services C&CS COS -----BEGIN PGP SIGNATURE----- Version: PGP Desktop 10.0.3 (Build 1) Charset: us-ascii wsBVAwUBUWWw/5bfnpCg64TVAQGAWggAp+gq0qVwciCCYarrO/3fSshpl7svySeK wtvxEcGx90c86Hb8KMb33F7XmB2uiwM/e2roMeHh7Q8qrD2RxmFVkUmrZvp5usq6 ttL2NC72nVWkqtg6axeOjcQkcFQc6m6bsObDJv11p3LIcD78aHXUYellhU8RNXSZ Zjh/zE2iIJ5MRJk9gcoaOOmicfMIaGLXScQAw2EJsD3TF/QsxoiXUbc3pwu9b5eI yZZ4C5P1Z1RdZROp/AU3i417znTPCXaObaulEnnt96uGqaKU79lNq/g5eb/58qpd lFy8/goB1Fd94J4/KG0zfoWMSf9POmeaBosBkrza9UkAU5TZ00yPVA== =PVlP -----END PGP SIGNATURE----- ------------------------------------------------------------------------------ Precog is a next-generation analytics platform capable of advanced analytics on semi-structured data. The platform includes APIs for building apps and a phenomenal toolset for data science. Developers can use our toolset for easy data analysis & visualization. Get a free account! http://www2.precog.com/precogplatform/slashdotnewsletter _______________________________________________ Bacula-devel mailing list Bacula-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-devel