(sorry for previous unfinished post, I mistakenly hit the send key :-/) Hello, my collegue reported this problem in the -users list few days ago, I'd like to report it here to -devel list with few additions. few days ago, bacula-director daemon started to get stuck during backups, reporting following error: 25-Aug 00:23 databox-dir JobId 1422: Start Backup JobId 1422, Job=Dev1Daily.2009-08-25_00.05.01_04 25-Aug 00:23 databox-dir JobId 1422: Using Device "FileStorage" 25-Aug 00:23 databox-dir: ABORTING due to ERROR in smartall.c:196 double free from smartall.c:330
Backtrace from parent bacula-dir process: #0 0x00007f96b84d6e74 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007f96b84d2874 in _L_lock_106 () from /lib64/libpthread.so.0 #2 0x00007f96b84d22e0 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00007f96b92caf32 in lmgr_p () from /usr/lib64/libbac.so.1 #4 0x00007f96b92af792 in sm_get_pool_memory () from /usr/lib64/libbac.so.1 #5 0x00007f96b92a4f83 in new_jcr () from /usr/lib64/libbac.so.1 #6 0x000000000043786d in wait_for_next_job () #7 0x000000000040e9d6 in main () Backtrace from child bacula-dir process: #0 0x00007f96b84d6e74 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007f96b84d2874 in _L_lock_106 () from /lib64/libpthread.so.0 #2 0x00007f96b84d22e0 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00007f96b92caf32 in lmgr_p () from /usr/lib64/libbac.so.1 #4 0x00007f96b92af792 in sm_get_pool_memory () from /usr/lib64/libbac.so.1 #5 0x000000000041303c in berrno::berrno () #6 0x00007f96b92b9a68 in signal_handler () from /usr/lib64/libbac.so.1 #7 <signal handler called> #8 0x00007f96b92acbe7 in e_msg () from /usr/lib64/libbac.so.1 #9 0x00007f96b92ba79e in sm_free () from /usr/lib64/libbac.so.1 #10 0x00007f96b92baa7c in sm_realloc () from /usr/lib64/libbac.so.1 #11 0x00007f96b92ae39d in sm_realloc_pool_memory () from /usr/lib64/libbac.so.1 #12 0x00007f96b92ae563 in sm_check_pool_memory_size () from /usr/lib64/libbac.so.1 #13 0x00007f96b92aec7e in pm_strcat () from /usr/lib64/libbac.so.1 #14 0x00007f96b9a20c4c in db_get_int_handler () from /usr/lib64/libbacsql.so.1 #15 0x00007f96b9a28f2d in db_sql_query () from /usr/lib64/libbacsql.so.1 #16 0x00007f96b9a20f02 in db_accurate_get_jobids () from /usr/lib64/libbacsql.so.1 #17 0x0000000000412635 in send_accurate_current_files () #18 0x0000000000412cc7 in do_backup () #19 0x000000000042835c in job_thread () #20 0x0000000000429eac in jobq_server () #21 0x00007f96b84d0367 in start_thread () from /lib64/libpthread.so.0 #22 0x00007f96b710e09d in clone () from /lib64/libc.so.6 I noticed that few days before the problems started, glibc got updated, fixing the following issue: http://rhn.redhat.com/errata/RHBA-2009-1202.html As this is related to threads and backtraces show that there might be a kind of threads deadlock, it it possible that this update could cause director failures? I'll try rebuilding bacula with this glibc and if this won't help, also downgrading glibc and the I'll report the results. In the meantime, if somebody could have a look on this, I'd greatly appreciate it. with best regards nik -- ------------------------------------- Nikola CIPRICH LinuxBox.cz, s.r.o. 28. rijna 168, 709 01 Ostrava tel.: +420 596 603 142 fax: +420 596 621 273 mobil: +420 777 093 799 www.linuxbox.cz mobil servis: +420 737 238 656 email servis: ser...@linuxbox.cz ------------------------------------- ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Bacula-devel mailing list Bacula-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-devel