* Kern Sibbald schrieb am 15.09.07 um 22:28 Uhr: > On Saturday 15 September 2007 21:09, Marc Schiffbauer wrote: > > > > What can I do/try/test now? >
I played with "Minimum Block Size" a bit. When set this seems to make the SD crash even on labeling tapes. However I now removed that setting again and labeling works fine so far. I now erased a tape (mt erase), then labeled it using bconsole. > - Run the debugger on it, and make it crash in some different way, maybe that > will tell us something. I have a litte incremental job (about 700MB). It runs just fine, but at the very end of the job the SD crashed. I ran the SD with gdb attached. This is the output on the console: In "Terminated Jobs" the job is "OK" Terminated Jobs: JobId Level Files Bytes Status Finished Name ====================================================================== [...] 535 Incr 114 740.2 M OK 16-Sep-07 01:31 lisa-ImportantData but the summary is not: 16-Sep 01:31 lisa-sd: Job write elapsed time = 00:02:36, Transfer rate = 4.745 M bytes/second 16-Sep 01:31 lisa-sd: Sending spooled attrs to the Director. Despooling 29,191 bytes ... 16-Sep 01:31 lisa-dir: lisa-ImportantData.2007-09-16_01.28.17 Error: Bacula lisa-dir 2.2.4 (14Sep07): 16-Sep-2007 01:31:31 Build OS: i386-pc-linux-gnu debian 3.1 JobId: 535 Job: lisa-ImportantData.2007-09-16_01.28.17 Backup Level: Incremental, since=2007-09-14 03:06:16 Client: "lisa-fd" 2.2.4 (14Sep07) i386-pc-linux-gnu,debian,3.1 FileSet: "lisa ImportantData FileSet" 2007-03-29 13:23:26 Pool: "Default" (From Job resource) Storage: "lisa-sd" (From command line) Scheduled time: 16-Sep-2007 01:27:55 Start time: 16-Sep-2007 01:28:50 End time: 16-Sep-2007 01:31:31 Elapsed time: 2 mins 41 secs Priority: 10 FD Files Written: 114 SD Files Written: 0 FD Bytes Written: 740,281,859 (740.2 MB) SD Bytes Written: 0 (0 B) Rate: 4598.0 KB/s Software Compression: None VSS: no Encryption: no Volume name(s): Tape_13 Volume Session Id: 2 Volume Session Time: 1189898149 Last Volume Bytes: 740,920,320 (740.9 MB) Non-fatal FD errors: 0 SD Errors: 0 FD termination status: OK SD termination status: Error Termination: *** Backup Error *** Maybe I did something wrong, but I got no backtrace... : [EMAIL PROTECTED]:~# gdb /usr/sbin/bacula-sd 7791 GNU gdb 6.3-debian Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-linux"...Using host libthread_db library "/lib/libthread_db.so.1". Attaching to program: /usr/sbin/bacula-sd, process 7791 Reading symbols from /lib/libacl.so.1...done. Loaded symbols for /lib/libacl.so.1 Reading symbols from /usr/lib/libz.so.1...done. Loaded symbols for /usr/lib/libz.so.1 Reading symbols from /usr/lib/libpython2.3.so.1.0...done. Loaded symbols for /usr/lib/libpython2.3.so.1.0 Reading symbols from /lib/libutil.so.1...done. Loaded symbols for /lib/libutil.so.1 Reading symbols from /lib/librt.so.1...done. Loaded symbols for /lib/librt.so.1 Reading symbols from /lib/libpthread.so.0...done. [Thread debugging using libthread_db enabled] [New Thread 16384 (LWP 7791)] [New Thread 32769 (LWP 7793)] [New Thread 16386 (LWP 7794)] [New Thread 32771 (LWP 7795)] Loaded symbols for /lib/libpthread.so.0 Reading symbols from /lib/libdl.so.2...done. Loaded symbols for /lib/libdl.so.2 Reading symbols from /lib/libwrap.so.0...done. Loaded symbols for /lib/libwrap.so.0 Reading symbols from /usr/lib/i686/cmov/libssl.so.0.9.7...done. Loaded symbols for /usr/lib/i686/cmov/libssl.so.0.9.7 Reading symbols from /usr/lib/i686/cmov/libcrypto.so.0.9.7...done. Loaded symbols for /usr/lib/i686/cmov/libcrypto.so.0.9.7 Reading symbols from /usr/lib/libstdc++.so.5...done. Loaded symbols for /usr/lib/libstdc++.so.5 Reading symbols from /lib/libm.so.6...done. Loaded symbols for /lib/libm.so.6 Reading symbols from /lib/libgcc_s.so.1...done. Loaded symbols for /lib/libgcc_s.so.1 Reading symbols from /lib/libc.so.6...done. Loaded symbols for /lib/libc.so.6 Reading symbols from /lib/libattr.so.1...done. Loaded symbols for /lib/libattr.so.1 Reading symbols from /lib/ld-linux.so.2...done. Loaded symbols for /lib/ld-linux.so.2 Reading symbols from /lib/libnsl.so.1...done. Loaded symbols for /lib/libnsl.so.1 Reading symbols from /lib/libnss_compat.so.2...done. Loaded symbols for /lib/libnss_compat.so.2 Reading symbols from /lib/libnss_nis.so.2...done. Loaded symbols for /lib/libnss_nis.so.2 Reading symbols from /lib/libnss_files.so.2...done. Loaded symbols for /lib/libnss_files.so.2 0x404a0001 in select () from /lib/libc.so.6 (gdb) (gdb) (gdb) run The program being debugged has been started already. Start it from the beginning? (y or n) n Program not restarted. (gdb) cont Continuing. [New Thread 49156 (LWP 7817)] [Thread 16386 (LWP 7794) exited] [Thread 49156 (LWP 7817) exited] [New Thread 65541 (LWP 7852)] [Thread 65541 (LWP 7852) exited] [New Thread 81926 (LWP 7855)] [New Thread 98311 (LWP 7862)] [Thread 98311 (LWP 7862) exited] [New Thread 114696 (LWP 7865)] [Thread 114696 (LWP 7865) exited] [New Thread 131081 (LWP 7867)] [Thread 81926 (LWP 7855) exited] [Thread 131081 (LWP 7867) exited] [New Thread 147466 (LWP 7871)] [Thread 147466 (LWP 7871) exited] [New Thread 163851 (LWP 7873)] [Thread 163851 (LWP 7873) exited] [New Thread 180236 (LWP 7874)] [Thread 180236 (LWP 7874) exited] [New Thread 196621 (LWP 7876)] [New Thread 213006 (LWP 7878)] [New Thread 229391 (LWP 7915)] [New Thread 245776 (LWP 7916)] [Thread 229391 (LWP 7915) exited] [Thread 213006 (LWP 7878) exited] [Thread 245776 (LWP 7916) exited] [Thread 196621 (LWP 7876) exited] [New Thread 262161 (LWP 7934)] [Thread 262161 (LWP 7934) exited] [New Thread 278546 (LWP 7942)] [New Thread 294931 (LWP 7950)] [Thread 294931 (LWP 7950) exited] Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 278546 (LWP 7942)] 0x080d2050 in ?? () (gdb) Quit (gdb) Continuing. Kaboom! bacula-sd, lisa-sd got signal 11 - Segmentation violation. Attempting traceback. Kaboom! exepath=/usr/sbin/ Calling: /usr/sbin/btraceback /usr/sbin/bacula-sd 7791 Traceback complete, attempting cleanup ... [Thread 32771 (LWP 7795) exited] Orphaned buffer: lisa-sd 528 bytes buf=80c2f08 allocated at bnet.c:674 Orphaned buffer: lisa-sd 272 bytes buf=80cc3f8 allocated at jcr.c:253 Orphaned buffer: lisa-sd 528 bytes buf=80e1518 allocated at bnet.c:673 Orphaned buffer: lisa-sd 528 bytes buf=80e1748 allocated at jcr.c:255 Orphaned buffer: lisa-sd 528 bytes buf=80e1a90 allocated at bnet.c:673 Orphaned buffer: lisa-sd 528 bytes buf=80e2618 allocated at bnet.c:674 Orphaned buffer: lisa-sd 146 bytes buf=80e1f70 allocated at job.c:114 Orphaned buffer: lisa-sd 146 bytes buf=80e2020 allocated at job.c:117 Orphaned buffer: lisa-sd 146 bytes buf=80e20d0 allocated at job.c:120 Orphaned buffer: lisa-sd 146 bytes buf=80e2180 allocated at job.c:128 Orphaned buffer: lisa-sd 128 bytes buf=80d0938 allocated at bnet.c:667 Orphaned buffer: lisa-sd 7 bytes buf=80d1f30 allocated at bnet.c:675 Orphaned buffer: lisa-sd 12 bytes buf=80d09d8 allocated at bnet.c:676 Orphaned buffer: lisa-sd 528 bytes buf=80d19d8 allocated at bnet.c:674 Orphaned buffer: lisa-sd 128 bytes buf=80d2050 allocated at bnet.c:667 Orphaned buffer: lisa-sd 7 bytes buf=80d20f0 allocated at bnet.c:675 Orphaned buffer: lisa-sd 12 bytes buf=80d2118 allocated at bnet.c:676 Orphaned buffer: lisa-sd 8 bytes buf=80d2148 allocated at workq.c:167 Orphaned buffer: lisa-sd 16 bytes buf=80e1f40 allocated at jcr.c:247 Orphaned buffer: lisa-sd 24 bytes buf=80d1698 allocated at dircmd.c:185 Orphaned buffer: lisa-sd 40 bytes buf=80d0a08 allocated at job.c:140 Orphaned buffer: lisa-sd 24 bytes buf=80d1f68 allocated at reserve.c:583 Orphaned buffer: lisa-sd 40 bytes buf=80e25c0 allocated at alist.c:53 Orphaned buffer: lisa-sd 24 bytes buf=80e2e20 allocated at reserve.c:606 Orphaned buffer: lisa-sd 8 bytes buf=80e1ef0 allocated at reserve.c:621 Orphaned buffer: lisa-sd 40 bytes buf=80cc528 allocated at alist.c:53 Orphaned buffer: lisa-sd 128 bytes buf=80d2de0 allocated at bnet.c:667 Orphaned buffer: lisa-sd 7 bytes buf=80d2d50 allocated at bnet.c:675 Orphaned buffer: lisa-sd 12 bytes buf=80cc570 allocated at bnet.c:676 Orphaned buffer: lisa-sd 65652 bytes buf=80f2dc8 allocated at bsock.c:583 Program exited with code 01. (gdb) backtrace No stack. (gdb) print my_name $1 = '\0' <repeats 29 times> (gdb) bt No stack. (gdb) thread apply all bt (gdb) f 0 No stack. (gdb) info locals No registers. (gdb) bt No stack. (gdb) f 1 No stack. (gdb) info locals No registers. (gdb) f 2 No stack. (gdb) info locals No registers. (gdb) f 3 No stack. (gdb) info locals No registers. (gdb) f 4 No stack. (gdb) info locals No registers. (gdb) f 5 No stack. (gdb) info locals No registers. (gdb) f 6 No stack. (gdb) info locals No registers. (gdb) f 7 No stack. (gdb) info locals No registers. (gdb) detach (gdb) quit [EMAIL PROTECTED]:~# > > - Make the smartalloc routine dump the *full* information it has on the > buffer > that was overrun. How can I do this? > > - Run the SD with valgrind, maybe it will point out what is overrunning the > buffer. Do I need somw special options to make this work? [EMAIL PROTECTED]:~# valgrind /usr/sbin/bacula-sd -f -c /etc/bacula/bacula-sd.conf ==8314== Memcheck, a memory error detector for x86-linux. ==8314== Copyright (C) 2002-2005, and GNU GPL'd, by Julian Seward et al. ==8314== Using valgrind-2.4.0, a program supervision framework for x86-linux. ==8314== Copyright (C) 2000-2005, and GNU GPL'd, by Julian Seward et al. ==8314== For more details, rerun with: -v ==8314== ==8314== Signal 11 (SIGSEGV) appears to have lost its siginfo; I can't go on. ==8314== This may be because one of your programs has consumed your ==8314== ration of siginfo structures. ==8314== Signal 11 (SIGSEGV) appears to have lost its siginfo; I can't go on. ==8314== This may be because one of your programs has consumed your ==8314== ration of siginfo structures. ==8314== ==8314== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) ==8314== malloc/free: in use at exit: 0 bytes in 0 blocks. ==8314== malloc/free: 0 allocs, 0 frees, 0 bytes allocated. ==8314== For counts of detected errors, rerun with: -v ==8314== No malloc'd blocks -- no leaks are possible. Segmentation fault [EMAIL PROTECTED]:~# > > - Figure out why it is the only SD crashing (at least that is what I deduce > from "one of my SD ... crashes ..." (paraphrased). The other one is on debian etch amd64 and is only storing to disk. The crashing SD is on the same machine as the Dir. > > - Back up to 2.0.3, rebuild it and see if it crashes too in the same way. I built a 2.0.3 SD and dir and did the same job again (still 2.2.4 FD): *Worked perfectly*, no crash. So I think we can eliminate the case of any hardware failure. 16-Sep 02:29 lisa-sd: Job write elapsed time = 00:02:52, Transfer rate = 4.397 M bytes/second 16-Sep 02:29 lisa-sd: Sending spooled attrs to the Director. Despooling 29,772 bytes ... 16-Sep 02:29 lisa-dir: Bacula 2.0.3 (06Mar07): 16-Sep-2007 02:29:25 JobId: 536 Job: lisa-ImportantData.2007-09-16_02.24.18 Backup Level: Incremental, since=2007-09-14 03:06:16 Client: "lisa-fd" 2.2.4 (14Sep07) i386-pc-linux-gnu,debian,3.1 FileSet: "lisa ImportantData FileSet" 2007-03-29 13:23:26 Pool: "Default" (From Job resource) Storage: "lisa-sd" (From Job resource) Scheduled time: 16-Sep-2007 02:24:12 Start time: 16-Sep-2007 02:25:36 End time: 16-Sep-2007 02:29:25 Elapsed time: 3 mins 49 secs Priority: 500 FD Files Written: 116 SD Files Written: 116 FD Bytes Written: 756,371,227 (756.3 MB) SD Bytes Written: 756,383,199 (756.3 MB) Rate: 3302.9 KB/s Software Compression: None VSS: no Encryption: no Volume name(s): Tape_13 Volume Session Id: 1 Volume Session Time: 1189902192 Last Volume Bytes: 1,497,904,128 (1.497 GB) Non-fatal FD errors: 0 SD Errors: 0 FD termination status: OK SD termination status: OK Termination: Backup OK 16-Sep 02:29 lisa-dir: Begin pruning Jobs. 16-Sep 02:29 lisa-dir: No Jobs found to prune. 16-Sep 02:29 lisa-dir: Begin pruning Files. 16-Sep 02:29 lisa-dir: No Files found to prune. 16-Sep 02:29 lisa-dir: End auto prune. *status client=lisa-fd Connecting to Client lisa-fd at lisa:9102 lisa-fd Version: 2.2.4 (14 September 2007) i386-pc-linux-gnu debian 3.1 Daemon started 15-Sep-07 13:23, 6 Jobs run since started. Heap: heap=694,224 smbytes=215,620 max_bytes=317,828 bufs=80 max_bufs=200 Sizeof: boffset_t=8 size_t=4 debug=1 trace=0 Running Jobs: Director connected at: 16-Sep-07 02:29 No Jobs running. ==== Terminated Jobs: JobId Level Files Bytes Status Finished Name ====================================================================== [...] 535 Incr 114 740.2 M OK 16-Sep-07 01:31 lisa-ImportantData 536 Incr 116 756.3 M OK 16-Sep-07 02:29 lisa-ImportantData ==== * Any more hints? -- **************************************************** * (morganj): 0 is false and 1 is true, correct? * * (alec_eso): 1, morganj * * (morganj): bastard. * ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Bacula-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/bacula-devel
