Hello,

I have a blocking issue with bacula-sd daemon. Environment :
- Debian Lenny AMD64
- Kernel: 2.6.32-bpo.4-amd64
- Bacula version : 3.0.3 and 5.0.3
- We use TLS for authentication and transfers.

Every few days, bacula-sd quits with a segfault. I've setup the debugging
stuff, so I finally have the backtrace. When I read it, I see nothing
"showing" what caused the segfault.

Anyone who "read" better ?

It happens both when there are a couple of low I/O jobs or several high I/O
jobs. By I/O, I mean disk and networ. Indeed, we backup on disk only. I
could tell a lot about our setup which would be a lot of noise, so let me
know what is actually interesting for the matter.

Regards,

[Thread debugging using libthread_db enabled]
[New Thread 0x7f8ef38f36f0 (LWP 18535)]
[New Thread 0x42900950 (LWP 12104)]
[New Thread 0x44904950 (LWP 26990)]
[New Thread 0x418fe950 (LWP 18539)]
0x00007f8ef0a89d52 in select () from /lib/libc.so.6
$1 = '\0' <repeats 29 times>
$2 = 0xa6a088 "bacula-sd"
$3 = 0xa6a0c8 "/usr/sbin/bacula-sd"
$4 = 0x0
$5 = 0x7f8ef2cc0c28 "5.0.3 (04 August 2010)"
$6 = 0x7f8ef2cc0c4c "x86_64-pc-linux-gnu"
$7 = 0x7f8ef2cc0c60 "debian"
$8 = 0x7f8ef2cc0c67 "5.0.5"
$9 = "backup2", '\0' <repeats 42 times>
$10 = 0x7f8ef2cc0c3f "debian 5.0.5"
$11 = 0
Environment variable "TestName" not defined.
#0  0x00007f8ef0a89d52 in select () from /lib/libc.so.6
#1  0x00007f8ef2c85264 in bnet_thread_server (addrs=0xa6ab98, max_clients=33,
    client_wq=0x6642c0,
    handle_client_request=0x4278c6 <handle_connection_request(void*)>)
    at bnet_server.c:161
#2  0x0000000000409945 in main (argc=0, argv=0x7fffbd0a8ac0) at stored.c:313

Thread 4 (Thread 0x418fe950 (LWP 18539)):
#0  0x00007f8ef20c1fad in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib/libpthread.so.0
#1  0x00007f8ef2cb6a3d in watchdog_thread (arg=0x0) at watchdog.c:321
#2  0x00007f8ef20bdfc7 in start_thread () from /lib/libpthread.so.0
#3  0x00007f8ef0a9064d in clone () from /lib/libc.so.6
#4  0x0000000000000000 in ?? ()

Thread 3 (Thread 0x44904950 (LWP 26990)):
#0  0x00007f8ef2a5b44e in ?? () from /usr/lib/libz.so.1
#1  0x00007f8ef2a5a28d in deflate () from /usr/lib/libz.so.1
#2  0x00007f8ef17e294e in ?? () from /usr/lib/libcrypto.so.0.9.8
#3  0x00007f8ef17e25b2 in COMP_compress_block ()
   from /usr/lib/libcrypto.so.0.9.8
#4  0x00007f8ef1a7e35e in ssl3_do_compress () from /usr/lib/libssl.so.0.9.8
#5  0x00007f8ef1a7e4ac in ?? () from /usr/lib/libssl.so.0.9.8
#6  0x00007f8ef1a7e9a0 in ssl3_write_bytes () from /usr/lib/libssl.so.0.9.8
#7  0x00007f8ef2cad1b0 in openssl_bsock_readwrite (bsock=0x107b568,
    ptr=0xc8efdc "", nbytes=4, write=true) at tls.c:626
#8  0x00007f8ef2cad483 in tls_bsock_writen (bsock=0x107b568, ptr=0xc8efdc "",
    nbytes=4) at tls.c:704
#9  0x00007f8ef2c84670 in write_nbytes (bsock=0x107b568, ptr=0xc8efdc "",
    nbytes=4) at bnet.c:128
#10 0x00007f8ef2c880a4 in BSOCK::send (this=0x107b568) at bsock.c:379
#11 0x00007f8ef2c885e8 in BSOCK::signal (this=0x107b568, signal=-4)
    at bsock.c:574
#12 0x0000000000428070 in handle_connection_request (arg=0x107b568)
    at dircmd.c:251
#13 0x00007f8ef2cb7587 in workq_server (arg=0x6642c0) at workq.c:346
#14 0x00007f8ef20bdfc7 in start_thread () from /lib/libpthread.so.0
#15 0x00007f8ef0a9064d in clone () from /lib/libc.so.6
#16 0x0000000000000000 in ?? ()

Thread 2 (Thread 0x42900950 (LWP 12104)):
#0  0x00007f8ef20c55ef in waitpid () from /lib/libpthread.so.0
#1  0x00007f8ef2cab0b7 in signal_handler (sig=11) at signal.c:229
#2  <signal handler called>
#3  0x00007f8ef2a5b0bf in ?? () from /usr/lib/libz.so.1
#4  0x00007f8ef2a5a28d in deflate () from /usr/lib/libz.so.1
#5  0x00007f8ef17e294e in ?? () from /usr/lib/libcrypto.so.0.9.8
#6  0x00007f8ef17e25b2 in COMP_compress_block ()
   from /usr/lib/libcrypto.so.0.9.8
#7  0x00007f8ef1a7e35e in ssl3_do_compress () from /usr/lib/libssl.so.0.9.8
#8  0x00007f8ef1a7e4ac in ?? () from /usr/lib/libssl.so.0.9.8
#9  0x00007f8ef1a7e9a0 in ssl3_write_bytes () from /usr/lib/libssl.so.0.9.8
#10 0x00007f8ef2cad1b0 in openssl_bsock_readwrite (bsock=0x107b568,
    ptr=0xc8efdc "", nbytes=182, write=true) at tls.c:626
#11 0x00007f8ef2cad483 in tls_bsock_writen (bsock=0x107b568, ptr=0xc8efdc "",
    nbytes=182) at tls.c:704
#12 0x00007f8ef2c84670 in write_nbytes (bsock=0x107b568, ptr=0xc8efdc "",
    nbytes=182) at bnet.c:128
#13 0x00007f8ef2c880a4 in BSOCK::send (this=0x107b568) at bsock.c:379
#14 0x00007f8ef2c887c7 in BSOCK::fsend (this=0x107b568,
    fmt=0x7f8ef2cc05d0 "Jmsg Job=%s type=%d level=%lld %s") at bsock.c:434
#15 0x00007f8ef2c9c80f in dispatch_message (jcr=0xe6ba38, type=6,
    mtime=1283388921,
    msg=0x428fe870 "backup2-sd JobId 83624: JobId=83624
Job=\"ivr-db1-1-System.2010-09-02_00.00.01_26\" marked to be
canceled.\n") at message.c:888
#16 0x00007f8ef2c9cf8e in Jmsg (jcr=0xe6ba38, type=6, mtime=0,
    fmt=0x451860 "JobId=%d Job=\"%s\" marked to be canceled.\n")
    at message.c:1292
#17 0x0000000000427763 in cancel_cmd (cjcr=0xc12248) at dircmd.c:335
#18 0x0000000000427f24 in handle_connection_request (arg=0x134a028)
    at dircmd.c:233
#19 0x00007f8ef2cb7587 in workq_server (arg=0x6642c0) at workq.c:346
#20 0x00007f8ef20bdfc7 in start_thread () from /lib/libpthread.so.0
#21 0x00007f8ef0a9064d in clone () from /lib/libc.so.6
#22 0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7f8ef38f36f0 (LWP 18535)):
#0  0x00007f8ef0a89d52 in select () from /lib/libc.so.6
#1  0x00007f8ef2c85264 in bnet_thread_server (addrs=0xa6ab98, max_clients=33,
    client_wq=0x6642c0,
    handle_client_request=0x4278c6 <handle_connection_request(void*)>)
    at bnet_server.c:161
#2  0x0000000000409945 in main (argc=0, argv=0x7fffbd0a8ac0) at stored.c:313
#0  0x00007f8ef0a89d52 in select () from /lib/libc.so.6
#0  0x00007f8ef0a89d52 in select () from /lib/libc.so.6
No symbol table info available.
#1  0x00007f8ef2c85264 in bnet_thread_server (addrs=0xa6ab98, max_clients=33,
    client_wq=0x6642c0,
    handle_client_request=0x4278c6 <handle_connection_request(void*)>)
    at bnet_server.c:161
161     bnet_server.c: No such file or directory.
        in bnet_server.c
Current language:  auto; currently c++
maxfd = 6
sockset = {fds_bits = {112, 0 <repeats 15 times>}}
newsockfd = 7
stat = 0
clilen = 16
cli_addr = {sa_family = 2,
  sa_data = "®(\177\000\001\001\000\000\000\000\000\000\000"}
tlog = 0
turnon = 1
request = {fd = 7, user = '\0' <repeats 127 times>,
  daemon = "backup2-sd", '\0' <repeats 117 times>,
  pid = "18535\000\000\000\000", client = {{name = '\0' <repeats 127 times>,
      addr = '\0' <repeats 127 times>, sin = 0x7f8ef1eb2a40, unit = 0x0,
      request = 0x7fffbd0a8310}}, server = {{name = '\0' <repeats 127 times>,
      addr = '\0' <repeats 127 times>, sin = 0x7f8ef1eb29c0, unit = 0x0,
      request = 0x7fffbd0a8310}}, sink = 0,
  hostname = 0x7f8ef1cafdc0 <sock_hostname>,
  hostaddr = 0x7f8ef1cafd70 <sock_hostaddr>, cleanup = 0, config = 0x0}
p = (IPADDR *) 0x0
fd_ptr = (s_sockfd *) 0x0
buf = 
"127.0.1.1\00032\000\000\000\000\0000\220ó\216\177\000\000...@\000\000\000\000\000à\200@\000\000\000\000\000\200\212\n½ÿ\177\000\000\220\215\vò\216\177\000\000y\216\n½ÿ\177\000\000
\210\n½ÿ\177\000\000RÀoó\216\177\000\000 \206\001", '\0' <repeats 13
times>, 
"H}\fò\216\177\000\000\000\000\000\000\000\000\000\000p\210\n½ÿ\177\000\000ïP\fò\216\177\000"
sockfds = {<SMARTALLOC> = {<No data fields>}, head = 0x7fffbd0a7880,
  tail = 0x7fffbd0a7820, loffset = 0, num_items = 3}
allbuf = "\001\000\000\000ÿ\177\000\000ܲ¦\000\000\000\000\000\200\212\n½ÿ\177",
'\0' <repeats 14 times>,
"\001\000\000\000`C\217ó\216\177\000\000à\214\217ó\216\177\000\000`z\n½ÿ\177\000\000\210\211\217ó\216\177\000\000Ö\tnñ\216\177\000\000\220\200\n½ÿ\177\000\000Ü\036oó\216\177\000\000\030§¦\000\000\000\000\000ÐW\217ó\216\177\000\000\016\000\000\000\000\000\000\000\026\000\000\000\000\000\000\000$ù\000\002\000\000\000\000\224#oó\216\177\000\000Änlñ\216\177\000\000$\000\000\000\216\177\000\000ä\003\b\000\000\000\000\000P\000\000\000\000\000\000\000(\000\000\000\000\000\000\...@\000\000\000\000\000\000\000\t\000\000\000\000\000\000\000p\000\000\000"...
#2  0x0000000000409945 in main (argc=0, argv=0x7fffbd0a8ac0) at stored.c:313
313     stored.c: No such file or directory.
        in stored.c
ch = -1
no_signals = false
test_config = false
thid = 1087273296
uid = 0x7fffbd0a8e97 "bacula"
gid = 0x7fffbd0a8ea1 "tape"
python_args = {progname = 0xa6abe8 "backup2-sd", scriptdir = 0x0,
  modulename = 0x44adbd "SDStartUp",
  configfile = 0xa6a2f8 "/etc/bacula/bacula-sd.conf",
  workingdir = 0xa6ac28 "/var/lib/bacula",
  job_getattr = 0x4396bc <job_getattr(_object*, char*)>,
  job_setattr = 0x439495 <job_setattr(_object*, char*, _object*)>}
#0  0x0000000000000000 in ?? ()
No symbol table info available.
#0  0x0000000000000000 in ?? ()
No symbol table info available.
#0  0x0000000000000000 in ?? ()
No symbol table info available.
#0  0x0000000000000000 in ?? ()
No symbol table info available.
#0  0x0000000000000000 in ?? ()
No symbol table info available.




-- 
Baptiste MALGUY
PGP fingerprint: 49B0 4F6E 4AA8 B149 B2DF  9267 0F65 6C1C C473 6EC2
------------------------------------------------------------------------------
This SF.net Dev2Dev email is sponsored by:

Show off your parallel programming skills.
Enter the Intel(R) Threading Challenge 2010.
http://p.sf.net/sfu/intel-thread-sfd
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to