Hello,

On Tuesday 09 March 2010 12:47:33 JanJaap Scholing wrote:
> Hi Kern,
>
> I reported the problem (with traceback) in Mantis (0001527).

Yes, I saw.  Thanks.  As best I can tell it is definitely deadlocked.  

Eric is going to show you how to enable the lock manager code later today.  
Then we would like you to reproduce it and the lock manager should produce a 
dump with additional information that will be useful.  In addition, please 
post your bacula-dir.conf

Best regards,

Kern

>
> Thanks.
>
> Regards,
>
> Jan Jaap
>
> > On Monday 08 March 2010 12:20:04 JanJaap Scholing wrote:
> > > Hi,
> > >
> > > My bacula-sd is deadlocking during copy jobs.
> > >
> > > Version 5.0.1
> > > compile option:  --with-readline=/usr/include/readline --disable-conio
> > > --with-mysql --enable-smartalloc Linux version:  i686-pc-linux-gnu
> > > debian 5.0.4
> > >
> > > Daily i copy every job made that night from 2 disk pools to a migrate
> > > pool using copy jobs (pool uncopied). The migrate pool contains a
> > > autochanger with to drives. Config of the bacula-sd autochanger see
> > > below.
> > >
> > > Both pools are loadbalancing there jobs over the 2 drives (using the
> > > maximum concurrent jobs =1 feature in the bacula-sd) as expected.
> > >
> > > After a while the load on both the dir and sd are dropping to 0.
> > >
> > > When I try to do status stor in the console I see the following
> > > (stopping at Used Volume status: and waiting forever):
> > >
> > >
> > > *status stor
> > > The defined Storage resources are:
> > >      1: Migrate
> > >      2: diskbackup
> > >      3: diskbackup2
> > > Select Storage resource (1-3): 1
> > > Connecting to Storage daemon Migrate at bacula-sd.solcon.nl:9103
> > >
> > > bacula-sd Version: 5.0.1 (24 February 2010) i686-pc-linux-gnu debian
> > > 5.0.4 Daemon started 08-Mar-10 10:07, 20 Jobs run since started.
> > >  Heap: heap=2,367,488 smbytes=1,724,139 max_bytes=1,985,497 bufs=250
> > > max_bufs=293 Sizes: boffset_t=8 size_t=4 int32_t=4 int64_t=8
> > >
> > > Running Jobs:
> > > Reading: Full Copy job D2D2T2 JobId=131743 Volume="disk2-1265"
> > >     pool="Disk2-Pool" device="diskbackup2" (/bacula/diskbackup2)
> > >     Files=3,337 Bytes=45,779,673 Bytes/sec=140,428
> > >     FDSocket closed
> > > ====
> > >
> > > Jobs waiting to reserve a drive:
> > > ====
> > >
> > > Terminated Jobs:
> > >  JobId  Level    Files      Bytes   Status   Finished        Name
> > > ===================================================================
> > > 131559  Full        184    70.61 M  OK       08-Mar-10 11:20 D2D2T
> > > 131561  Full        215    32.03 M  OK       08-Mar-10 11:22 D2D2T
> > > 131735  Full        233    1.541 G  OK       08-Mar-10 11:23 D2D2T2
> > > 131563  Full         41    2.123 M  OK       08-Mar-10 11:25 D2D2T
> > > 131737  Full        118    241.3 M  OK       08-Mar-10 11:28 D2D2T2
> > > 131565  Full     21,836    239.7 M  OK       08-Mar-10 11:30 D2D2T
> > > 131739  Full      2,069    596.7 M  OK       08-Mar-10 11:32 D2D2T2
> > > 131567  Full        122    315.9 M  OK       08-Mar-10 11:34 D2D2T
> > > 131741  Full        141    2.779 M  OK       08-Mar-10 11:35 D2D2T2
> > > 131569  Full        187    20.59 M  OK       08-Mar-10 11:37 D2D2T
> > > ====
> > >
> > > Device status:
> > > Autochanger "TandbergT40" with devices:
> > >    "Drive-1" (/dev/st0)
> > >    "Drive-2" (/dev/st1)
> > > Device "Drive-1" (/dev/st0) is mounted with:
> > >     Volume:      B4MO03
> > >     Pool:        Migrate-Pool
> > >     Media type:  LTO-4
> > >     Slot 19 is loaded in drive 0.
> > >     Total Bytes=4,032,451,584 Blocks=62,506 Bytes/block=64,513
> > >     Positioned at File=8 Block=0
> > > Device "Drive-2" (/dev/st1) is mounted with:
> > >     Volume:      B4MO01
> > >     Pool:        Migrate-Pool
> > >     Media type:  LTO-4
> > >     Slot 22 is loaded in drive 1.
> > >     Total Bytes=3,954,521,088 Blocks=61,298 Bytes/block=64,513
> > >     Positioned at File=17 Block=0
> > > Device "diskbackup" (/bacula/diskbackup) is not open.
> > > Device "diskrestore" (/bacula/diskbackup) is not open.
> > > Device "diskbackup2" (/bacula/diskbackup2) is mounted with:
> > >     Volume:      disk2-1265
> > >     Pool:        *unknown*
> > >     Media type:  File
> > >     Total Bytes Read=0 Blocks Read=0 Bytes/block=0
> > >     Positioned at File=0 Block=1,575,192,713
> > > ====
> > >
> > > Used Volume status:
> > >
> > >
> > >
> > > Restarting the bacula-sd is the only way to get him back to work.
> > >
> > > I tried to run the bacula-sd manual unther gdb, but gdb is not showing
> > > something usefull:
> > >
> > > only something like this:
> > >
> > > [New Thread 0xb61e0b90 (LWP 8026)]
> > > [New Thread 0xb57dfb90 (LWP 8027)]
> > > [New Thread 0xb4fdeb90 (LWP 8028)]
> > > [New Thread 0xb47ddb90 (LWP 8029)]
> > > [Thread 0xb57dfb90 (LWP 8027) exited]
> > > [Thread 0xb61e0b90 (LWP 8026) exited]
> > > [Thread 0xb47ddb90 (LWP 8029) exited]
> > > [Thread 0xb4fdeb90 (LWP 8028) exited]
> > > [New Thread 0xb4fdeb90 (LWP 8046)]
> > > [Thread 0xb4fdeb90 (LWP 8046) exited]
> > > [New Thread 0xb4fdeb90 (LWP 8047)]
> > > [Thread 0xb4fdeb90 (LWP 8047) exited]
> > > [New Thread 0xb4fdeb90 (LWP 8050)]
> > > [Thread 0xb4fdeb90 (LWP 8050) exited]
> > > [New Thread 0xb4fdeb90 (LWP 8051)]
> > > [New Thread 0xb47ddb90 (LWP 8052)]
> > > [Thread 0xb47ddb90 (LWP 8052) exited]
> > > [New Thread 0xb47ddb90 (LWP 8053)]
> > > [New Thread 0xb61e0b90 (LWP 8062)]
> > > [New Thread 0xb57dfb90 (LWP 8063)]
> > > [Thread 0xb57dfb90 (LWP 8063) exited]
> > > [Thread 0xb61e0b90 (LWP 8062) exited]
> > > [New Thread 0xb57dfb90 (LWP 8064)]
> > > [Thread 0xb57dfb90 (LWP 8064) exited]
> > > [New Thread 0xb57dfb90 (LWP 8076)]
> > >
> > >
> > > What can be the problem and how do i make a good trace using gdb. I
> > > tried the way described in the manual:
> > > http://bacula.org/5.0.x-manuals/en/problems/problems/What_Do_When_Bacul
> > >a.ht ml#SECTION00640000000000000000
> > >
> > > I dont understand the part
> > > thread apply all bt
> > > Please help me out
> >
> > Yes, well the trick here is: if the SD does not crash, gdb will continue
> > running and you cannot type in any commands to the shell window in which
> > you ran gdb on the SD.  So when the SD deadlocks, the first thing to do
> > is run a "status storage" from bconsole, then in the shell window where
> > you are running gdb, enter ctl-c  (you may need to do it several times). 
> > The debugger should then come back to the command prompt, and at that
> > time you can enter:
> >
> > thread apply all bt
> >
> > to get a backtrace of all the threads that are running.
> >
> > you can then either enter "cont" and the debugger will give control back
> > to the SD, or you can enter "quit".
>
> I will try that. Should I make a bug in mantis with that backtrace?
>
> > Kern
> >
> > PS: I recommend using /dev/nst0 and /dev/nst1 instead of the ones you are
> > using.  Also, I see no reason to limit each drive to a single job -- with
> > LTO-4, you should be able to run many simultaneous jobs per drive
> > (perhaps 10).
>
> You mean run more jobs at the same time to one drive?
>
> Jan Jaap
>
> > > Thanks and regards,
> > >
> > > Jan Jaap
> > >
> > > Config Bacula-sd autochanger part:
> > >
> > >
> > > Autochanger {
> > >   Name = TandbergT40
> > >   Device = Drive-1
> > >   Device = Drive-2
> > >   Changer Command = "/etc/bacula/mtx-changer %c %o %S %a %d"
> > >   Changer Device = /dev/sg3
> > > }
> > >
> > > Device {
> > >   Name = Drive-1                      #
> > >   Drive Index = 0
> > >   Media Type = LTO-4
> > >   Archive Device = /dev/st0
> > >   AutomaticMount = yes;               # when device opened, read it
> > >   AlwaysOpen = yes;
> > >   RemovableMedia = yes;
> > >   RandomAccess = no;
> > >   AutoChanger = yes
> > >   Alert Command = "/bin/sh -c '/usr/sbin/smartctl -H -l error %c'"
> > >   Spool Directory = /bacula/spool
> > >   Maximum Concurrent Jobs = 1
> > > }
> > >
> > > Device {
> > >   Name = Drive-2                      #
> > >   Drive Index = 1
> > >   Media Type = LTO-4
> > >   Archive Device = /dev/st1
> > >   AutomaticMount = yes;               # when device opened, read it
> > >   AlwaysOpen = yes;
> > >   RemovableMedia = yes;
> > >   RandomAccess = no;
> > >   AutoChanger = yes
> > >   Alert Command = "/bin/sh -c '/usr/sbin/smartctl -H -l error %c'"
> > >   Spool Directory = /bacula/spool
> > >   Maximum Concurrent Jobs = 1
> > > }
>
> De nieuwe Internet Explorer: sneller, eenvoudiger en veiliger dan ooit
> Download nu
> _________________________________________________________________
> Download gratis emoticons voor Messenger
> http://www.rulive.nl/aspx/emoticons.aspx



------------------------------------------------------------------------------
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Bacula-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Reply via email to