-----BEGIN PGP SIGNED MESSAGE-----

Hello everybody,

I am dealing with a segmentation fault error on one of my bacula-fd clients. 
It's running 5.2.13 on SPARC Solaris 10 Generic_147440-01. According to the 
debug output it is caused by

abudhabi-rad1-sys-fd: lockmgr.c:360-0 ASSERT failed at bnet_server.c:209: 
current >= 0 
Bacula interrupted by signal 11: Segmentation Fault

I am able to reproduce this by querying the client status from bconsole a 
SECOND time after restarting bacula-fd. The first time it works fine but the 
second time it crashes. It happens even when I try to run backup jobs. The 
first one succeeds, the second one crashes with the same assert problem. Or if 
I try to query the status of the client while a backup is running, i.e. the 
second connection after restart.

Here is the backtrace of the time when it crashes:

- ----- bacula.1203.traceback ------
[New process 1203]
Retry #1:
Retry #2:
Retry #3:
Retry #4:
[Thread debugging using libthread_db enabled]
[New LWP    3        ]
[New LWP    2        ]
[New Thread 1        ]
[New Thread 2 (LWP 2)]
[New Thread 3        ]
[Switching to Thread 1        ]
0xfebca710 in __lwp_park () from /lib/libc.so.1
$1 = '\000' <repeats 29 times>
$2 = 0x47530 "bacula-fd"
$3 = 0x0
$4 = 0x0
$5 = 0xff2985d0 "5.2.13 (19 February 2013)"
$6 = 0xff2985a8 "sparc-sun-solaris2.10"
$7 = 0xff2985a0 "solaris"
$8 = 0xff298598 "5.10"
$9 = "abudhabi-rad1-sys", '\000' <repeats 20 times>
$10 = 0xff2985c0 "solaris 5.10"
$11 = 0
Environment variable "TestName" not defined.
#0  0xfebca710 in __lwp_park () from /lib/libc.so.1
#1  0xfebc2a78 in mutex_lock_queue () from /lib/libc.so.1
#2  0xfebace94 in _flockget () from /lib/libc.so.1
#3  0xfebadbf8 in fclose () from /lib/libc.so.1
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

Thread 6 (Thread 3        ):
#0  0xfebca710 in __lwp_park () from /lib/libc.so.1
#1  0xfebc459c in cond_sleep_queue () from /lib/libc.so.1
#2  0xfebc4760 in cond_wait_queue () from /lib/libc.so.1
#3  0xfebc4ba4 in cond_wait_common () from /lib/libc.so.1
#4  0xfebc4d38 in _cond_timedwait () from /lib/libc.so.1
#5  0xfebc4e2c in cond_timedwait () from /lib/libc.so.1
#6  0xfebc4e6c in pthread_cond_timedwait () from /lib/libc.so.1
#7  0xff2919ec in bthread_cond_timedwait_p (cond=0xff2ae3d0, m=0xff2ae3b8, 
abstime=0xfe8fbf18, file=0xff29ab68 "watchdog.c", line=321) at lockmgr.c:824
#8  0xff28af50 in watchdog_thread (arg=<optimized out>) at watchdog.c:321
#9  0xff2912d4 in lmgr_thread_launcher (x=0x493f0) at lockmgr.c:939
#10 0xfebca678 in _lwp_start () from /lib/libc.so.1
#11 0xfebca678 in _lwp_start () from /lib/libc.so.1
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 5 (Thread 2 (LWP 2)):
#0  0xfebce658 in _waitid () from /lib/libc.so.1
#1  0xfeb6f81c in _waitpid () from /lib/libc.so.1
#2  0xfebbe000 in waitpid () from /lib/libc.so.1
#3  0xff281dc0 in signal_handler (sig=11) at signal.c:237
#4  <signal handler called>
#5  0xff2924cc in lmgr_thread_t::do_V (this=0x5dcd8, m=0xff2ae158, f=0xff293a20 
"bnet_server.c", l=209) at lockmgr.c:360
#6  0xff2918b0 in bthread_mutex_unlock_p (m=0xff2ae158, file=0xff293a20 
"bnet_server.c", line=209) at lockmgr.c:793
#7  0xff262bf4 in bnet_thread_server (addr_list=<optimized out>, 
max_clients=20, client_wq=0x47180, handle_client_request=0x242dc 
<handle_client_request(void*)>) at bnet_server.c:209
#8  0x0002e4ec in main (argc=<optimized out>, argv=<optimized out>) at 
filed.c:278

Thread 4 (Thread 1        ):
#0  0xfebca710 in __lwp_park () from /lib/libc.so.1
#1  0xfebc2a78 in mutex_lock_queue () from /lib/libc.so.1
#2  0xfebace94 in _flockget () from /lib/libc.so.1
#3  0xfebadbf8 in fclose () from /lib/libc.so.1
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

Thread 3 (LWP    2        ):
#0  0xfebce658 in _waitid () from /lib/libc.so.1
#1  0xfeb6f81c in _waitpid () from /lib/libc.so.1
#2  0xfebbe000 in waitpid () from /lib/libc.so.1
#3  0xff281dc0 in signal_handler (sig=11) at signal.c:237
#4  <signal handler called>
#5  0xff2924cc in lmgr_thread_t::do_V (this=0x5dcd8, m=0xff2ae158, f=0xff293a20 
"bnet_server.c", l=209) at lockmgr.c:360
#6  0xff2918b0 in bthread_mutex_unlock_p (m=0xff2ae158, file=0xff293a20 
"bnet_server.c", line=209) at lockmgr.c:793
#7  0xff262bf4 in bnet_thread_server (addr_list=<optimized out>, 
max_clients=20, client_wq=0x47180, handle_client_request=0x242dc 
<handle_client_request(void*)>) at bnet_server.c:209
#8  0x0002e4ec in main (argc=<optimized out>, argv=<optimized out>) at 
filed.c:278

Thread 2 (LWP    3        ):
#0  0xfebca710 in __lwp_park () from /lib/libc.so.1
#1  0xfebc459c in cond_sleep_queue () from /lib/libc.so.1
#2  0xfebc4760 in cond_wait_queue () from /lib/libc.so.1
#3  0xfebc4ba4 in cond_wait_common () from /lib/libc.so.1
#4  0xfebc4d38 in _cond_timedwait () from /lib/libc.so.1
#5  0xfebc4e2c in cond_timedwait () from /lib/libc.so.1
#6  0xfebc4e6c in pthread_cond_timedwait () from /lib/libc.so.1
#7  0xff2919ec in bthread_cond_timedwait_p (cond=0xff2ae3d0, m=0xff2ae3b8, 
abstime=0xfe8fbf18, file=0xff29ab68 "watchdog.c", line=321) at lockmgr.c:824
#8  0xff28af50 in watchdog_thread (arg=<optimized out>) at watchdog.c:321
#9  0xff2912d4 in lmgr_thread_launcher (x=0x493f0) at lockmgr.c:939
#10 0xfebca678 in _lwp_start () from /lib/libc.so.1
#11 0xfebca678 in _lwp_start () from /lib/libc.so.1
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 1 (LWP    1        ):
#0  0xfebca710 in __lwp_park () from /lib/libc.so.1
#1  0xfebc2a78 in mutex_lock_queue () from /lib/libc.so.1
#2  0xfebace94 in _flockget () from /lib/libc.so.1
#3  0xfebadbf8 in fclose () from /lib/libc.so.1
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
#0  0xfebca710 in __lwp_park () from /lib/libc.so.1
No symbol table info available.
#1  0xfebc2a78 in mutex_lock_queue () from /lib/libc.so.1
No symbol table info available.
#2  0xfebace94 in _flockget () from /lib/libc.so.1
No symbol table info available.
#3  0xfebadbf8 in fclose () from /lib/libc.so.1
No symbol table info available.
#0  0x00000000 in ?? ()
No symbol table info available.
#0  0x00000000 in ?? ()
No symbol table info available.
#0  0x00000000 in ?? ()
No symbol table info available.
#0  0x00000000 in ?? ()
No symbol table info available.
- ----- SNIP -----

And here is the debug output from bacula-fd -c ../etc/bacula-fd.conf -v -f -d799

- ------ SNIP -----

# ./bacula-fd -c ../etc/bacula-fd.conf -v -f -d799
bacula-fd: lex.c:185-0 Open config file: ../etc/bacula-fd.conf
bacula-fd: filed_conf.c:452-0 Inserting director res: bacula-mon
bacula-fd: lex.c:185-0 Open config file: ../etc/bacula-fd.conf
abudhabi-rad1-sys-fd: message.c:504-0 Close_msg jcr=0
abudhabi-rad1-sys-fd: message.c:347-0 Copy message resource 4a040 to 48300
abudhabi-rad1-sys-fd: bsys.c:556-0 Could not open state file. sfd=-1 size=192: 
ERR=No such file or directory
abudhabi-rad1-sys-fd: fd_plugins.c:1100-0 plugin dir is NULL
abudhabi-rad1-sys-fd: filed.c:276-0 filed: listening on port 9102
abudhabi-rad1-sys-fd: bnet_server.c:112-0 Addresses host[ipv4:0.0.0.0:9102] 
abudhabi-rad1-sys-fd: bnet.c:766-0 who=client host=128.122.128.60 port=9102
abudhabi-rad1-sys-fd: find.c:81-0 init_find_files ff=629d0
abudhabi-rad1-sys-fd: job.c:270-0 <dird: Hello Director bacula-dir calling
abudhabi-rad1-sys-fd: job.c:286-0 Executing Hello command.
abudhabi-rad1-sys-fd: job.c:436-0 Calling Authenticate
abudhabi-rad1-sys-fd: cram-md5.c:72-0 send: auth cram-md5 
<146880531.1365609328@abudhabi-rad1-sys-fd> ssl=0
abudhabi-rad1-sys-fd: cram-md5.c:131-0 cram-get received: auth cram-md5 
<1124129763.1365609328@bacula-dir> ssl=0
abudhabi-rad1-sys-fd: cram-md5.c:150-0 sending resp to challenge: 
vSJPvl/W69/Ag6Zs83+6+C
abudhabi-rad1-sys-fd: job.c:440-0 OK Authenticate
abudhabi-rad1-sys-fd: job.c:270-0 <dird: JobId=0 
Job=-Console-.2013-04-10_11.40.56_40 SDid=0 SDtime=0 Authorization=dummy
abudhabi-rad1-sys-fd: job.c:286-0 Executing JobId= command.
abudhabi-rad1-sys-fd: job.c:1737-0 set sd auth key
abudhabi-rad1-sys-fd: job.c:544-0 JobId=0 Auth=dummy
abudhabi-rad1-sys-fd: fd_plugins.c:1197-0 plugin list is NULL
abudhabi-rad1-sys-fd: job.c:270-0 <dird: statusabudhabi-rad1-sys-fd: 
job.c:286-0 Executing status command.
abudhabi-rad1-sys-fd: runscript.c:108-0 runscript: running all RUNSCRIPT object 
(ClientAfterJob) JobStatus=C
abudhabi-rad1-sys-fd: job.c:399-0 Calling term_find_files
abudhabi-rad1-sys-fd: job.c:404-0 Done with term_find_files
abudhabi-rad1-sys-fd: runscript.c:286-0 runscript: freeing all RUNSCRIPTS object
abudhabi-rad1-sys-fd: message.c:504-0 Close_msg jcr=62580
abudhabi-rad1-sys-fd: message.c:504-0 Close_msg jcr=0
abudhabi-rad1-sys-fd: job.c:406-0 Done with free_jcr
abudhabi-rad1-sys-fd: mem_pool.c:375-0 garbage collect memory pool

That was the first client status query. This is the debug output of the second 
query:

abudhabi-rad1-sys-fd: lockmgr.c:360-0 ASSERT failed at bnet_server.c:209: 
current >= 0 
Bacula interrupted by signal 11: Segmentation Fault
Kaboom! bacula-fd, abudhabi-rad1-sys-fd got signal 11 - Segmentation Fault. 
Attempting traceback.
Kaboom! exepath=/usr/local/bacula/sbin
abudhabi-rad1-sys-fd: signal.c:205-0 Working=/usr/local/bacula/var
abudhabi-rad1-sys-fd: signal.c:206-0 btpath=/usr/local/bacula/sbin/btraceback
abudhabi-rad1-sys-fd: signal.c:207-0 exepath=/usr/local/bacula/sbin/bacula-fd
abudhabi-rad1-sys-fd: signal.c:236-0 Doing waitpid
Calling: /usr/local/bacula/sbin/btraceback /usr/local/bacula/sbin/bacula-fd 
1203 /usr/local/bacula/var
gcore: /usr/local/bacula/var/bacula-fd.1203 dumped
/usr/local/bacula/sbin/btraceback: /usr/local/bacula/sbin/bsmtp: not found
abudhabi-rad1-sys-fd: signalThe btraceback call returned 1
Dumping: /usr/local/bacula/var/abudhabi-rad1-sys-fd.1203.bactrace
cat: write error: Broken pipe
- ----- SNIP -----

I'll be happy to provide more information if needed.

Thanks!

- - Michael
- -- 
Michael Hocke                                       New York University
Sr UNIX Systems Administrator           Information Technology Services
                                                               C&CS COS


-----BEGIN PGP SIGNATURE-----
Version: PGP Desktop 10.0.3 (Build 1)
Charset: us-ascii

wsBVAwUBUWWw/5bfnpCg64TVAQGAWggAp+gq0qVwciCCYarrO/3fSshpl7svySeK
wtvxEcGx90c86Hb8KMb33F7XmB2uiwM/e2roMeHh7Q8qrD2RxmFVkUmrZvp5usq6
ttL2NC72nVWkqtg6axeOjcQkcFQc6m6bsObDJv11p3LIcD78aHXUYellhU8RNXSZ
Zjh/zE2iIJ5MRJk9gcoaOOmicfMIaGLXScQAw2EJsD3TF/QsxoiXUbc3pwu9b5eI
yZZ4C5P1Z1RdZROp/AU3i417znTPCXaObaulEnnt96uGqaKU79lNq/g5eb/58qpd
lFy8/goB1Fd94J4/KG0zfoWMSf9POmeaBosBkrza9UkAU5TZ00yPVA==
=PVlP
-----END PGP SIGNATURE-----

------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Bacula-devel mailing list
Bacula-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Reply via email to