On Monday 08 March 2010 12:20:04 JanJaap Scholing wrote:
> Hi,
>
> My bacula-sd is deadlocking during copy jobs.
>
> Version 5.0.1
> compile option: --with-readline=/usr/include/readline --disable-conio
> --with-mysql --enable-smartalloc Linux version: i686-pc-linux-gnu debian
> 5.0.4
>
> Daily i copy every job made that night from 2 disk pools to a migrate pool
> using copy jobs (pool uncopied). The migrate pool contains a autochanger
> with to drives. Config of the bacula-sd autochanger see below.
>
> Both pools are loadbalancing there jobs over the 2 drives (using the
> maximum concurrent jobs =1 feature in the bacula-sd) as expected.
>
> After a while the load on both the dir and sd are dropping to 0.
>
> When I try to do status stor in the console I see the following (stopping
> at Used Volume status: and waiting forever):
>
>
> *status stor
> The defined Storage resources are:
> 1: Migrate
> 2: diskbackup
> 3: diskbackup2
> Select Storage resource (1-3): 1
> Connecting to Storage daemon Migrate at bacula-sd.solcon.nl:9103
>
> bacula-sd Version: 5.0.1 (24 February 2010) i686-pc-linux-gnu debian 5.0.4
> Daemon started 08-Mar-10 10:07, 20 Jobs run since started.
> Heap: heap=2,367,488 smbytes=1,724,139 max_bytes=1,985,497 bufs=250
> max_bufs=293 Sizes: boffset_t=8 size_t=4 int32_t=4 int64_t=8
>
> Running Jobs:
> Reading: Full Copy job D2D2T2 JobId=131743 Volume="disk2-1265"
> pool="Disk2-Pool" device="diskbackup2" (/bacula/diskbackup2)
> Files=3,337 Bytes=45,779,673 Bytes/sec=140,428
> FDSocket closed
> ====
>
> Jobs waiting to reserve a drive:
> ====
>
> Terminated Jobs:
> JobId Level Files Bytes Status Finished Name
> ===================================================================
> 131559 Full 184 70.61 M OK 08-Mar-10 11:20 D2D2T
> 131561 Full 215 32.03 M OK 08-Mar-10 11:22 D2D2T
> 131735 Full 233 1.541 G OK 08-Mar-10 11:23 D2D2T2
> 131563 Full 41 2.123 M OK 08-Mar-10 11:25 D2D2T
> 131737 Full 118 241.3 M OK 08-Mar-10 11:28 D2D2T2
> 131565 Full 21,836 239.7 M OK 08-Mar-10 11:30 D2D2T
> 131739 Full 2,069 596.7 M OK 08-Mar-10 11:32 D2D2T2
> 131567 Full 122 315.9 M OK 08-Mar-10 11:34 D2D2T
> 131741 Full 141 2.779 M OK 08-Mar-10 11:35 D2D2T2
> 131569 Full 187 20.59 M OK 08-Mar-10 11:37 D2D2T
> ====
>
> Device status:
> Autochanger "TandbergT40" with devices:
> "Drive-1" (/dev/st0)
> "Drive-2" (/dev/st1)
> Device "Drive-1" (/dev/st0) is mounted with:
> Volume: B4MO03
> Pool: Migrate-Pool
> Media type: LTO-4
> Slot 19 is loaded in drive 0.
> Total Bytes=4,032,451,584 Blocks=62,506 Bytes/block=64,513
> Positioned at File=8 Block=0
> Device "Drive-2" (/dev/st1) is mounted with:
> Volume: B4MO01
> Pool: Migrate-Pool
> Media type: LTO-4
> Slot 22 is loaded in drive 1.
> Total Bytes=3,954,521,088 Blocks=61,298 Bytes/block=64,513
> Positioned at File=17 Block=0
> Device "diskbackup" (/bacula/diskbackup) is not open.
> Device "diskrestore" (/bacula/diskbackup) is not open.
> Device "diskbackup2" (/bacula/diskbackup2) is mounted with:
> Volume: disk2-1265
> Pool: *unknown*
> Media type: File
> Total Bytes Read=0 Blocks Read=0 Bytes/block=0
> Positioned at File=0 Block=1,575,192,713
> ====
>
> Used Volume status:
>
>
>
> Restarting the bacula-sd is the only way to get him back to work.
>
> I tried to run the bacula-sd manual unther gdb, but gdb is not showing
> something usefull:
>
> only something like this:
>
> [New Thread 0xb61e0b90 (LWP 8026)]
> [New Thread 0xb57dfb90 (LWP 8027)]
> [New Thread 0xb4fdeb90 (LWP 8028)]
> [New Thread 0xb47ddb90 (LWP 8029)]
> [Thread 0xb57dfb90 (LWP 8027) exited]
> [Thread 0xb61e0b90 (LWP 8026) exited]
> [Thread 0xb47ddb90 (LWP 8029) exited]
> [Thread 0xb4fdeb90 (LWP 8028) exited]
> [New Thread 0xb4fdeb90 (LWP 8046)]
> [Thread 0xb4fdeb90 (LWP 8046) exited]
> [New Thread 0xb4fdeb90 (LWP 8047)]
> [Thread 0xb4fdeb90 (LWP 8047) exited]
> [New Thread 0xb4fdeb90 (LWP 8050)]
> [Thread 0xb4fdeb90 (LWP 8050) exited]
> [New Thread 0xb4fdeb90 (LWP 8051)]
> [New Thread 0xb47ddb90 (LWP 8052)]
> [Thread 0xb47ddb90 (LWP 8052) exited]
> [New Thread 0xb47ddb90 (LWP 8053)]
> [New Thread 0xb61e0b90 (LWP 8062)]
> [New Thread 0xb57dfb90 (LWP 8063)]
> [Thread 0xb57dfb90 (LWP 8063) exited]
> [Thread 0xb61e0b90 (LWP 8062) exited]
> [New Thread 0xb57dfb90 (LWP 8064)]
> [Thread 0xb57dfb90 (LWP 8064) exited]
> [New Thread 0xb57dfb90 (LWP 8076)]
>
>
> What can be the problem and how do i make a good trace using gdb. I tried
> the way described in the manual:
> http://bacula.org/5.0.x-manuals/en/problems/problems/What_Do_When_Bacula.ht
>ml#SECTION00640000000000000000
>
> I dont understand the part
> thread apply all bt
> Please help me out
Yes, well the trick here is: if the SD does not crash, gdb will continue
running and you cannot type in any commands to the shell window in which you
ran gdb on the SD. So when the SD deadlocks, the first thing to do is run
a "status storage" from bconsole, then in the shell window where you are
running gdb, enter ctl-c (you may need to do it several times). The
debugger should then come back to the command prompt, and at that time you
can enter:
thread apply all bt
to get a backtrace of all the threads that are running.
you can then either enter "cont" and the debugger will give control back to
the SD, or you can enter "quit".
Kern
PS: I recommend using /dev/nst0 and /dev/nst1 instead of the ones you are
using. Also, I see no reason to limit each drive to a single job -- with
LTO-4, you should be able to run many simultaneous jobs per drive (perhaps
10).
>
> Thanks and regards,
>
> Jan Jaap
>
> Config Bacula-sd autochanger part:
>
>
> Autochanger {
> Name = TandbergT40
> Device = Drive-1
> Device = Drive-2
> Changer Command = "/etc/bacula/mtx-changer %c %o %S %a %d"
> Changer Device = /dev/sg3
> }
>
> Device {
> Name = Drive-1 #
> Drive Index = 0
> Media Type = LTO-4
> Archive Device = /dev/st0
> AutomaticMount = yes; # when device opened, read it
> AlwaysOpen = yes;
> RemovableMedia = yes;
> RandomAccess = no;
> AutoChanger = yes
> Alert Command = "/bin/sh -c '/usr/sbin/smartctl -H -l error %c'"
> Spool Directory = /bacula/spool
> Maximum Concurrent Jobs = 1
> }
>
> Device {
> Name = Drive-2 #
> Drive Index = 1
> Media Type = LTO-4
> Archive Device = /dev/st1
> AutomaticMount = yes; # when device opened, read it
> AlwaysOpen = yes;
> RemovableMedia = yes;
> RandomAccess = no;
> AutoChanger = yes
> Alert Command = "/bin/sh -c '/usr/sbin/smartctl -H -l error %c'"
> Spool Directory = /bacula/spool
> Maximum Concurrent Jobs = 1
> }
>
>
> _________________________________________________________________
> Download gratis emoticons voor Messenger
> http://www.rulive.nl/aspx/emoticons.aspx
------------------------------------------------------------------------------
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Bacula-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-devel