Hi,

we're running Bacula 1.38.11 on Debian 3.1 (Sarge) on i686 with MySQL.
It has been running stable for some time now. Last night, we encountered
two bacula-dir segfaults, apparently during volume pruning:

<snip>
03-Feb 00:15 adm01-dir: Pruning oldest volume "full-volume-1"
03-Feb 00:15 adm01-dir: Pruning oldest volume "full-volume-1"
03-Feb 00:15 adm01-dir: Pruning oldest volume "full-volume-1"
03-Feb 00:15 adm01-dir: Fatal Error because: Bacula interrupted by signal 11: Segmentation violation
<snip>

<snip>
03-Feb 01:45 adm01-dir: Pruning oldest volume "full-volume-1"
03-Feb 01:45 adm01-dir: Pruning oldest volume "full-volume-1"
03-Feb 01:45 adm01-dir: Pruning oldest volume "full-volume-1"
03-Feb 01:45 adm01-dir: Fatal Error because: Bacula interrupted by signal 11: Segmentation violation
<snip>

We received backtraces on both occasions; they are practically identical
(see attachments).

Now I'm no programmer, but this part looks suspicious to me:

#10 0x0807e908 in status_cmd (ua=0x81114b8,
    cmd=0x31 <Address 0x31 out of bounds>) at ua_status.c:138
#11 0x080682ac in do_a_command (ua=0x81114b8,
    cmd=0x31 <Address 0x31 out of bounds>) at ua_cmds.c:162

Furthermore, we restarted bacula-dir after the second crash and started the
remaining jobs manually and they ran without problems. The bacula-dir is
still running now, several hours later.

Has anybody seen this before and/or have any advice on what to do next?
I'd be happy to provide more info, if needed.


Leander

Using host libthread_db library "/lib/tls/libthread_db.so.1".
`system-supplied DSO at 0xffffe000' has disappeared; keeping its symbols.
[Thread debugging using libthread_db enabled]
[New Thread -1212448096 (LWP 10572)]
[New Thread -1274020944 (LWP 25568)]
[New Thread -1221911632 (LWP 10574)]
[New Thread -1213523024 (LWP 10573)]
0xb7e1edfc in __nanosleep_nocancel () from /lib/tls/libpthread.so.0
$1 = "adm01-dir", '\0' <repeats 20 times>
$2 = 0x80de898 "bacula-dir"
$3 = 0x80de8c0 "/usr/local/bacula/sbin/bacula-dir"
$4 = "MySQL"
$5 = 0x80cc870 "1.38.11 (28 June 2006)"
$6 = 0x80c315c "i686-pc-linux-gnu"
$7 = 0x80c3155 "debian"
$8 = 0x80c316e "3.1"
#0  0xb7e1edfc in __nanosleep_nocancel () from /lib/tls/libpthread.so.0
#1  0x08093874 in bmicrosleep (sec=60, usec=0) at bsys.c:54
#2  0x08067836 in wait_for_next_job (one_shot_job_to_run=0x0)
    at scheduler.c:117
#3  0x0804b3c8 in main (argc=224405328, argv=0x80b4d34) at dird.c:249

Thread 4 (Thread -1213523024 (LWP 10573)):
#0  0xb7ccca27 in select () from /lib/tls/libc.so.6
#1  0x080981b4 in bnet_thread_server (addrs=0x0, max_clients=-514, 
    client_wq=0x80dd280, handle_client_request=0xfffffdfe) at bnet_server.c:148
#2  0x0807e129 in connect_thread (arg=0xfffffdfe) at ua_server.c:73
#3  0xb7e19b63 in start_thread () from /lib/tls/libpthread.so.0
#4  0xb7cd318a in clone () from /lib/tls/libc.so.6

Thread 3 (Thread -1221911632 (LWP 10574)):
#0  0xb7e1c440 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib/tls/libpthread.so.0
#1  0x080b139d in watchdog_thread (arg=0x0) at watchdog.c:292
#2  0xb7e19b63 in start_thread () from /lib/tls/libpthread.so.0
#3  0xb7cd318a in clone () from /lib/tls/libc.so.6

Thread 2 (Thread -1274020944 (LWP 25568)):
#0  0xb7e1f561 in __waitpid_nocancel () from /lib/tls/libpthread.so.0
#1  0x080a97f8 in signal_handler (sig=11) at signal.c:159
#2  <signal handler called>
#3  0xb7c5c572 in fputs () from /lib/tls/libc.so.6
#4  0x080a2698 in dispatch_message (jcr=0xd602750, type=6, mtime=1170458105, 
    msg=0xb40fe100 "adm01-dir: Pruning oldest volume \"full-volume-1\"\n")
    at message.c:696
#5  0x080a34c7 in Jmsg (jcr=0xd602750, type=6, mtime=49, 
    fmt=0x80ba5fb "Pruning oldest volume \"%s\"\n") at message.c:1064
#6  0x080638f6 in find_next_volume_for_append (jcr=0xd602750, mr=0xb40ff520, 
    index=1, create=false) at next_vol.c:134
#7  0x0807f623 in prt_runtime (ua=0xd601220, sp=0xd600fb0) at ua_status.c:385
#8  0x0807f8d2 in list_scheduled_jobs (ua=0xd601220) at ua_status.c:497
#9  0x0807efae in do_director_status (ua=0xd601220) at ua_status.c:266
#10 0x0807e908 in status_cmd (ua=0xd601220, 
    cmd=0x31 <Address 0x31 out of bounds>) at ua_status.c:138
#11 0x080682ac in do_a_command (ua=0xd601220, 
    cmd=0x31 <Address 0x31 out of bounds>) at ua_cmds.c:162
#12 0x0807e303 in handle_UA_client_request (arg=0xd601238) at ua_server.c:134
#13 0x080b1cb9 in workq_server (arg=0x80dd280) at workq.c:347
#14 0xb7e19b63 in start_thread () from /lib/tls/libpthread.so.0
#15 0xb7cd318a in clone () from /lib/tls/libc.so.6

Thread 1 (Thread -1212448096 (LWP 10572)):
#0  0xb7e1edfc in __nanosleep_nocancel () from /lib/tls/libpthread.so.0
#1  0x08093874 in bmicrosleep (sec=60, usec=0) at bsys.c:54
#2  0x08067836 in wait_for_next_job (one_shot_job_to_run=0x0)
    at scheduler.c:117
#3  0x0804b3c8 in main (argc=224405328, argv=0x80b4d34) at dird.c:249
#0  0xb7e1edfc in __nanosleep_nocancel () from /lib/tls/libpthread.so.0
#0  0xb7e1edfc in __nanosleep_nocancel () from /lib/tls/libpthread.so.0
No symbol table info available.
#1  0x08093874 in bmicrosleep (sec=60, usec=0) at bsys.c:54
54         stat = nanosleep(&timeout, NULL);
Current language:  auto; currently c++
timeout = {tv_sec = 60, tv_nsec = 0}
tv = {tv_sec = 0, tv_usec = 3}
tz = {tz_minuteswest = 1, tz_dsttime = 107}
stat = 0
#2  0x08067836 in wait_for_next_job (one_shot_job_to_run=0x0)
    at scheduler.c:117
117           bmicrosleep(next_check_secs, 0); /* recheck once per minute */
jcr = (JCR *) 0xd602750
job = (JOB *) 0x0
run = (RUN *) 0x80f5ad0
now = 0
first = false
next_job = (job_item *) 0x0
#3  0x0804b3c8 in main (argc=224405328, argv=0x80b4d34) at dird.c:249
249              break;                       /* yes, terminate */
ch = -516
jcr = (JCR *) 0xd602750
no_signals = 134527216
test_config = 0
uid = 0x0
gid = 0x0
#0  0x00000000 in ?? ()
No symbol table info available.
#0  0x00000000 in ?? ()
No symbol table info available.
#0  0x00000000 in ?? ()
No symbol table info available.
#0  0x00000000 in ?? ()
No symbol table info available.
Using host libthread_db library "/lib/tls/libthread_db.so.1".
`system-supplied DSO at 0xffffe000' has disappeared; keeping its symbols.
[Thread debugging using libthread_db enabled]
[New Thread -1212595552 (LWP 31202)]
[New Thread -1230447696 (LWP 24082)]
[New Thread -1222059088 (LWP 31204)]
[New Thread -1213670480 (LWP 31203)]
0xb7dfadfc in __nanosleep_nocancel () from /lib/tls/libpthread.so.0
$1 = "adm01-dir", '\0' <repeats 20 times>
$2 = 0x80de898 "bacula-dir"
$3 = 0x80de8c0 "/usr/local/bacula/sbin/bacula-dir"
$4 = "MySQL"
$5 = 0x80cc870 "1.38.11 (28 June 2006)"
$6 = 0x80c315c "i686-pc-linux-gnu"
$7 = 0x80c3155 "debian"
$8 = 0x80c316e "3.1"
#0  0xb7dfadfc in __nanosleep_nocancel () from /lib/tls/libpthread.so.0
#1  0x08093874 in bmicrosleep (sec=60, usec=0) at bsys.c:54
#2  0x08067836 in wait_for_next_job (one_shot_job_to_run=0x0)
    at scheduler.c:117
#3  0x0804b3c8 in main (argc=134958418, argv=0x80b4d34) at dird.c:249

Thread 4 (Thread -1213670480 (LWP 31203)):
#0  0xb7ca8a27 in select () from /lib/tls/libc.so.6
#1  0x080981b4 in bnet_thread_server (addrs=0x0, max_clients=-514, 
    client_wq=0x80dd280, handle_client_request=0xfffffdfe) at bnet_server.c:148
#2  0x0807e129 in connect_thread (arg=0xfffffdfe) at ua_server.c:73
#3  0xb7df5b63 in start_thread () from /lib/tls/libpthread.so.0
#4  0xb7caf18a in clone () from /lib/tls/libc.so.6

Thread 3 (Thread -1222059088 (LWP 31204)):
#0  0xb7df8440 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib/tls/libpthread.so.0
#1  0x080b139d in watchdog_thread (arg=0x0) at watchdog.c:292
#2  0xb7df5b63 in start_thread () from /lib/tls/libpthread.so.0
#3  0xb7caf18a in clone () from /lib/tls/libc.so.6

Thread 2 (Thread -1230447696 (LWP 24082)):
#0  0xb7dfb561 in __waitpid_nocancel () from /lib/tls/libpthread.so.0
#1  0x080a97f8 in signal_handler (sig=11) at signal.c:159
#2  <signal handler called>
#3  0xb7c38572 in fputs () from /lib/tls/libc.so.6
#4  0x080a2698 in dispatch_message (jcr=0x82671a0, type=6, mtime=1170463504, 
    msg=0xb6a8c100 "adm01-dir: Pruning oldest volume \"full-volume-1\"\n")
    at message.c:696
#5  0x080a34c7 in Jmsg (jcr=0x82671a0, type=6, mtime=49, 
    fmt=0x80ba5fb "Pruning oldest volume \"%s\"\n") at message.c:1064
#6  0x080638f6 in find_next_volume_for_append (jcr=0x82671a0, mr=0xb6a8d520, 
    index=1, create=false) at next_vol.c:134
#7  0x0807f623 in prt_runtime (ua=0x81114b8, sp=0x81f64b0) at ua_status.c:385
#8  0x0807f8d2 in list_scheduled_jobs (ua=0x81114b8) at ua_status.c:497
#9  0x0807efae in do_director_status (ua=0x81114b8) at ua_status.c:266
#10 0x0807e908 in status_cmd (ua=0x81114b8, 
    cmd=0x31 <Address 0x31 out of bounds>) at ua_status.c:138
#11 0x080682ac in do_a_command (ua=0x81114b8, 
    cmd=0x31 <Address 0x31 out of bounds>) at ua_cmds.c:162
#12 0x0807e303 in handle_UA_client_request (arg=0x81114d0) at ua_server.c:134
#13 0x080b1cb9 in workq_server (arg=0x80dd280) at workq.c:347
#14 0xb7df5b63 in start_thread () from /lib/tls/libpthread.so.0
#15 0xb7caf18a in clone () from /lib/tls/libc.so.6

Thread 1 (Thread -1212595552 (LWP 31202)):
#0  0xb7dfadfc in __nanosleep_nocancel () from /lib/tls/libpthread.so.0
#1  0x08093874 in bmicrosleep (sec=60, usec=0) at bsys.c:54
#2  0x08067836 in wait_for_next_job (one_shot_job_to_run=0x0)
    at scheduler.c:117
#3  0x0804b3c8 in main (argc=134958418, argv=0x80b4d34) at dird.c:249
#0  0xb7dfadfc in __nanosleep_nocancel () from /lib/tls/libpthread.so.0
#0  0xb7dfadfc in __nanosleep_nocancel () from /lib/tls/libpthread.so.0
No symbol table info available.
#1  0x08093874 in bmicrosleep (sec=60, usec=0) at bsys.c:54
54         stat = nanosleep(&timeout, NULL);
Current language:  auto; currently c++
timeout = {tv_sec = 60, tv_nsec = 0}
tv = {tv_sec = 1, tv_usec = 3}
tz = {tz_minuteswest = 1, tz_dsttime = 107}
stat = 0
#2  0x08067836 in wait_for_next_job (one_shot_job_to_run=0x0)
    at scheduler.c:117
117           bmicrosleep(next_check_secs, 0); /* recheck once per minute */
jcr = (JCR *) 0xbf935618
job = (JOB *) 0x0
run = (RUN *) 0xb4d52
now = 0
first = false
next_job = (job_item *) 0x0
#3  0x0804b3c8 in main (argc=134958418, argv=0x80b4d34) at dird.c:249
249              break;                       /* yes, terminate */
ch = -516
jcr = (JCR *) 0x80b4d52
no_signals = 134527216
test_config = 0
uid = 0x0
gid = 0x0
#0  0x00000000 in ?? ()
No symbol table info available.
#0  0x00000000 in ?? ()
No symbol table info available.
#0  0x00000000 in ?? ()
No symbol table info available.
#0  0x00000000 in ?? ()
No symbol table info available.
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to