(sorry for previous unfinished post, I mistakenly hit the send key :-/)

Hello,
my collegue reported this problem in the -users list few days ago,
I'd like to report it here to -devel list with few additions.
few days ago, bacula-director daemon started to get stuck during backups,
reporting following error:
25-Aug 00:23 databox-dir JobId 1422: Start Backup JobId 1422, 
Job=Dev1Daily.2009-08-25_00.05.01_04
25-Aug 00:23 databox-dir JobId 1422: Using Device "FileStorage"
25-Aug 00:23 databox-dir: ABORTING due to ERROR in smartall.c:196
double free from smartall.c:330

Backtrace from parent bacula-dir process:

#0  0x00007f96b84d6e74 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f96b84d2874 in _L_lock_106 () from /lib64/libpthread.so.0
#2  0x00007f96b84d22e0 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007f96b92caf32 in lmgr_p () from /usr/lib64/libbac.so.1
#4  0x00007f96b92af792 in sm_get_pool_memory () from /usr/lib64/libbac.so.1
#5  0x00007f96b92a4f83 in new_jcr () from /usr/lib64/libbac.so.1
#6  0x000000000043786d in wait_for_next_job ()
#7  0x000000000040e9d6 in main ()

Backtrace from child bacula-dir process:

#0  0x00007f96b84d6e74 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f96b84d2874 in _L_lock_106 () from /lib64/libpthread.so.0
#2  0x00007f96b84d22e0 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007f96b92caf32 in lmgr_p () from /usr/lib64/libbac.so.1
#4  0x00007f96b92af792 in sm_get_pool_memory () from /usr/lib64/libbac.so.1
#5  0x000000000041303c in berrno::berrno ()
#6  0x00007f96b92b9a68 in signal_handler () from /usr/lib64/libbac.so.1
#7  <signal handler called>
#8  0x00007f96b92acbe7 in e_msg () from /usr/lib64/libbac.so.1
#9  0x00007f96b92ba79e in sm_free () from /usr/lib64/libbac.so.1
#10 0x00007f96b92baa7c in sm_realloc () from /usr/lib64/libbac.so.1
#11 0x00007f96b92ae39d in sm_realloc_pool_memory () from /usr/lib64/libbac.so.1
#12 0x00007f96b92ae563 in sm_check_pool_memory_size () from 
/usr/lib64/libbac.so.1
#13 0x00007f96b92aec7e in pm_strcat () from /usr/lib64/libbac.so.1
#14 0x00007f96b9a20c4c in db_get_int_handler () from /usr/lib64/libbacsql.so.1
#15 0x00007f96b9a28f2d in db_sql_query () from /usr/lib64/libbacsql.so.1
#16 0x00007f96b9a20f02 in db_accurate_get_jobids () from 
/usr/lib64/libbacsql.so.1
#17 0x0000000000412635 in send_accurate_current_files ()
#18 0x0000000000412cc7 in do_backup ()
#19 0x000000000042835c in job_thread ()
#20 0x0000000000429eac in jobq_server ()
#21 0x00007f96b84d0367 in start_thread () from /lib64/libpthread.so.0
#22 0x00007f96b710e09d in clone () from /lib64/libc.so.6

I noticed that few days before the problems started, glibc got updated, fixing 
the following
issue:
http://rhn.redhat.com/errata/RHBA-2009-1202.html
As this is related to threads and backtraces show that there might be a kind of 
threads deadlock,
it it possible that this update could cause director failures?
I'll try rebuilding bacula with this glibc and if this won't help, also 
downgrading glibc and
the I'll report the results. In the meantime, if somebody could have a look on 
this, I'd greatly appreciate it.

with best regards

nik



-- 
-------------------------------------
Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.:   +420 596 603 142
fax:    +420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-------------------------------------

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Bacula-devel mailing list
Bacula-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Reply via email to