> From: [email protected]
> To: [email protected]
> Subject: Re: [Bacula-devel] Deadlock bacula-sd
> Date: Mon, 8 Mar 2010 12:43:34 +0100
> CC: [email protected]
> 
> On Monday 08 March 2010 12:20:04 JanJaap Scholing wrote:
> > Hi,
> >
> > My bacula-sd is deadlocking during copy jobs.
> >
> > Version 5.0.1
> > compile option:  --with-readline=/usr/include/readline --disable-conio
> > --with-mysql --enable-smartalloc Linux version:  i686-pc-linux-gnu debian
> > 5.0.4
> >
> > Daily i copy every job made that night from 2 disk pools to a migrate pool
> > using copy jobs (pool uncopied). The migrate pool contains a autochanger
> > with to drives. Config of the bacula-sd autochanger see below.
> >
> > Both pools are loadbalancing there jobs over the 2 drives (using the
> > maximum concurrent jobs =1 feature in the bacula-sd) as expected.
> >
> > After a while the load on both the dir and sd are dropping to 0.
> >
> > When I try to do status stor in the console I see the following (stopping
> > at Used Volume status: and waiting forever):
> >
> >
> > *status stor
> > The defined Storage resources are:
> >      1: Migrate
> >      2: diskbackup
> >      3: diskbackup2
> > Select Storage resource (1-3): 1
> > Connecting to Storage daemon Migrate at bacula-sd.solcon.nl:9103
> >
> > bacula-sd Version: 5.0.1 (24 February 2010) i686-pc-linux-gnu debian 5.0.4
> > Daemon started 08-Mar-10 10:07, 20 Jobs run since started.
> >  Heap: heap=2,367,488 smbytes=1,724,139 max_bytes=1,985,497 bufs=250
> > max_bufs=293 Sizes: boffset_t=8 size_t=4 int32_t=4 int64_t=8
> >
> > Running Jobs:
> > Reading: Full Copy job D2D2T2 JobId=131743 Volume="disk2-1265"
> >     pool="Disk2-Pool" device="diskbackup2" (/bacula/diskbackup2)
> >     Files=3,337 Bytes=45,779,673 Bytes/sec=140,428
> >     FDSocket closed
> > ====
> >
> > Jobs waiting to reserve a drive:
> > ====
> >
> > Terminated Jobs:
> >  JobId  Level    Files      Bytes   Status   Finished        Name
> > ===================================================================
> > 131559  Full        184    70.61 M  OK       08-Mar-10 11:20 D2D2T
> > 131561  Full        215    32.03 M  OK       08-Mar-10 11:22 D2D2T
> > 131735  Full        233    1.541 G  OK       08-Mar-10 11:23 D2D2T2
> > 131563  Full         41    2.123 M  OK       08-Mar-10 11:25 D2D2T
> > 131737  Full        118    241.3 M  OK       08-Mar-10 11:28 D2D2T2
> > 131565  Full     21,836    239.7 M  OK       08-Mar-10 11:30 D2D2T
> > 131739  Full      2,069    596.7 M  OK       08-Mar-10 11:32 D2D2T2
> > 131567  Full        122    315.9 M  OK       08-Mar-10 11:34 D2D2T
> > 131741  Full        141    2.779 M  OK       08-Mar-10 11:35 D2D2T2
> > 131569  Full        187    20.59 M  OK       08-Mar-10 11:37 D2D2T
> > ====
> >
> > Device status:
> > Autochanger "TandbergT40" with devices:
> >    "Drive-1" (/dev/st0)
> >    "Drive-2" (/dev/st1)
> > Device "Drive-1" (/dev/st0) is mounted with:
> >     Volume:      B4MO03
> >     Pool:        Migrate-Pool
> >     Media type:  LTO-4
> >     Slot 19 is loaded in drive 0.
> >     Total Bytes=4,032,451,584 Blocks=62,506 Bytes/block=64,513
> >     Positioned at File=8 Block=0
> > Device "Drive-2" (/dev/st1) is mounted with:
> >     Volume:      B4MO01
> >     Pool:        Migrate-Pool
> >     Media type:  LTO-4
> >     Slot 22 is loaded in drive 1.
> >     Total Bytes=3,954,521,088 Blocks=61,298 Bytes/block=64,513
> >     Positioned at File=17 Block=0
> > Device "diskbackup" (/bacula/diskbackup) is not open.
> > Device "diskrestore" (/bacula/diskbackup) is not open.
> > Device "diskbackup2" (/bacula/diskbackup2) is mounted with:
> >     Volume:      disk2-1265
> >     Pool:        *unknown*
> >     Media type:  File
> >     Total Bytes Read=0 Blocks Read=0 Bytes/block=0
> >     Positioned at File=0 Block=1,575,192,713
> > ====
> >
> > Used Volume status:
> >
> >
> >
> > Restarting the bacula-sd is the only way to get him back to work.
> >
> > I tried to run the bacula-sd manual unther gdb, but gdb is not showing
> > something usefull:
> >
> > only something like this:
> >
> > [New Thread 0xb61e0b90 (LWP 8026)]
> > [New Thread 0xb57dfb90 (LWP 8027)]
> > [New Thread 0xb4fdeb90 (LWP 8028)]
> > [New Thread 0xb47ddb90 (LWP 8029)]
> > [Thread 0xb57dfb90 (LWP 8027) exited]
> > [Thread 0xb61e0b90 (LWP 8026) exited]
> > [Thread 0xb47ddb90 (LWP 8029) exited]
> > [Thread 0xb4fdeb90 (LWP 8028) exited]
> > [New Thread 0xb4fdeb90 (LWP 8046)]
> > [Thread 0xb4fdeb90 (LWP 8046) exited]
> > [New Thread 0xb4fdeb90 (LWP 8047)]
> > [Thread 0xb4fdeb90 (LWP 8047) exited]
> > [New Thread 0xb4fdeb90 (LWP 8050)]
> > [Thread 0xb4fdeb90 (LWP 8050) exited]
> > [New Thread 0xb4fdeb90 (LWP 8051)]
> > [New Thread 0xb47ddb90 (LWP 8052)]
> > [Thread 0xb47ddb90 (LWP 8052) exited]
> > [New Thread 0xb47ddb90 (LWP 8053)]
> > [New Thread 0xb61e0b90 (LWP 8062)]
> > [New Thread 0xb57dfb90 (LWP 8063)]
> > [Thread 0xb57dfb90 (LWP 8063) exited]
> > [Thread 0xb61e0b90 (LWP 8062) exited]
> > [New Thread 0xb57dfb90 (LWP 8064)]
> > [Thread 0xb57dfb90 (LWP 8064) exited]
> > [New Thread 0xb57dfb90 (LWP 8076)]
> >
> >
> > What can be the problem and how do i make a good trace using gdb. I tried
> > the way described in the manual:
> > http://bacula.org/5.0.x-manuals/en/problems/problems/What_Do_When_Bacula.ht
> >ml#SECTION00640000000000000000
> >
> > I dont understand the part
> > thread apply all bt
> > Please help me out
> 
> 
> Yes, well the trick here is: if the SD does not crash, gdb will continue 
> running and you cannot type in any commands to the shell window in which you 
> ran gdb on the SD.  So when the SD deadlocks, the first thing to do is run 
> a "status storage" from bconsole, then in the shell window where you are 
> running gdb, enter ctl-c  (you may need to do it several times).  The 
> debugger should then come back to the command prompt, and at that time you 
> can enter:
> 
> thread apply all bt
> 
> to get a backtrace of all the threads that are running.
> 
> you can then either enter "cont" and the debugger will give control back to 
> the SD, or you can enter "quit".
> 

I will try that. Should I make a bug in mantis with that backtrace?


> Kern
> 
> PS: I recommend using /dev/nst0 and /dev/nst1 instead of the ones you are 
> using.  Also, I see no reason to limit each drive to a single job -- with 
> LTO-4, you should be able to run many simultaneous jobs per drive (perhaps 
> 10).

You mean run more jobs at the same time to one drive?

Jan Jaap

> 
> >
> > Thanks and regards,
> >
> > Jan Jaap
> >
> > Config Bacula-sd autochanger part:
> >
> >
> > Autochanger {
> >   Name = TandbergT40
> >   Device = Drive-1
> >   Device = Drive-2
> >   Changer Command = "/etc/bacula/mtx-changer %c %o %S %a %d"
> >   Changer Device = /dev/sg3
> > }
> >
> > Device {
> >   Name = Drive-1                      #
> >   Drive Index = 0
> >   Media Type = LTO-4
> >   Archive Device = /dev/st0
> >   AutomaticMount = yes;               # when device opened, read it
> >   AlwaysOpen = yes;
> >   RemovableMedia = yes;
> >   RandomAccess = no;
> >   AutoChanger = yes
> >   Alert Command = "/bin/sh -c '/usr/sbin/smartctl -H -l error %c'"
> >   Spool Directory = /bacula/spool
> >   Maximum Concurrent Jobs = 1
> > }
> >
> > Device {
> >   Name = Drive-2                      #
> >   Drive Index = 1
> >   Media Type = LTO-4
> >   Archive Device = /dev/st1
> >   AutomaticMount = yes;               # when device opened, read it
> >   AlwaysOpen = yes;
> >   RemovableMedia = yes;
> >   RandomAccess = no;
> >   AutoChanger = yes
> >   Alert Command = "/bin/sh -c '/usr/sbin/smartctl -H -l error %c'"
> >   Spool Directory = /bacula/spool
> >   Maximum Concurrent Jobs = 1
> > }
> >

                                          
_________________________________________________________________
Een netbook met Windows 7? Hier vind je alles dat je moet weten.
www.windows.nl/netbook
------------------------------------------------------------------------------
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Bacula-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Reply via email to