Hello Pasi, Le Thursday 04 December 2008 13:13:56 Pasi Kärkkäinen, vous avez écrit : > On Thu, Nov 13, 2008 at 05:37:06PM +0100, Eric Bollengier wrote: > > Hello, > > > > Le Thursday 13 November 2008 17:03:10 Pasi Kärkkäinen, vous avez écrit : > > > Hello list! > > > > > > I'm using Bacula 2.5.19 and trying 'copy jobs' feature to copy jobs > > > from disk volumes/pools to tape. > > > > > > Sometimes bacula-sd seems to get stuck.. it hangs without doing > > > anything. Now it happened when tape got full and Bacula started to > > > change the tape on the drive (using autoloader): > > > > > > bacula-sd JobId 3082: Start Copying JobId 3082, > > > Job=CopyPool4UncopiedToTape.2008-11-13_10.53.04.54 bacula-sd JobId > > > 3082: Using Device "IBM-LTO3-Drive" > > > bacula-sd JobId 3082: Ready to read from volume "Pool4-Vol-0127" on > > > device "FSDevice4" (/mnt/backup1/pool04). bacula-sd JobId 3082: Forward > > > spacing Volume "Pool4-Vol-0127" to file:block 0:218. bacula-sd JobId > > > 3082: End of Volume "756NNNL3" at 764:10067 on device "IBM-LTO3-Drive" > > > (/dev/nst0). Write of 64512 bytes got -1. bacula-sd JobId 3082: Re-read > > > of last block succeeded. > > > bacula-sd JobId 3082: End of medium on Volume "756NNNL3" > > > Bytes=725,237,130,240 Blocks=11,241,894 at 13-Nov-2008 11:51. bacula-sd > > > JobId 3082: 3307 Issuing autochanger "unload slot 3, drive 0" command. > > > > > > <nothing happens after this> > > > > > > > > > *sta > > > Status available for: > > > 1: Director > > > 2: Storage > > > 3: Client > > > 4: All > > > Select daemon type for status (1-4): 2 > > > > > > ... > > > > > > Device status: > > > Autochanger "IBM-LTO3-AutoChanger" with devices: > > > "IBM-LTO3-Drive" (/dev/nst0) > > > Device "FSDevice0" (/mnt/backup1/pool00) is not open. > > > Device "FSDevice1" (/mnt/backup1/pool01) is not open. > > > Device "FSDevice2" (/mnt/backup1/pool02) is not open. > > > Device "FSDevice3" (/mnt/backup1/pool03) is not open. > > > Device "FSDevice4" (/mnt/backup1/pool04) is mounted with: > > > Volume: Pool4-Vol-0127 > > > Pool: Pool4 > > > Media type: File4 > > > Total Bytes Read=1,649,507,328 Blocks Read=25,569 > > > Bytes/block=64,512 Positioned at File=0 Block=1,649,507,534 > > > Device "IBM-LTO3-Drive" (/dev/nst0) is not open. > > > Device is being initialized. > > > Drive 0 is not loaded. > > > ==== > > > > > > Used Volume status: > > > > > > <hangs here and nothing happens> > > > > > > > > > I can exit bconsole by pressing CTRL+C multiple times.. if I restart > > > bconsole and run that again, it gets stuck again.. > > > > > > I tried 'strace -p <pid>' to see what bacula-sd is doing: > > > > > > # strace -p 7339 > > > Process 7339 attached - interrupt to quit > > > select(5, [4], NULL, NULL, NULL <unfinished ...> > > > Process 7339 detached > > > > > > So.. bacula-sd seems to be stuck on select() .. > > > > > > Running 'mtx' seems to work fine.. at the same time when bacula-sd is > > > stuck. > > > > > > # mtx -f /dev/sg3 status > > > Storage Changer /dev/sg3:1 Drives, 8 Slots ( 0 Import/Export ) > > > Data Transfer Element 0:Empty > > > Storage Element 1:Full :VolumeTag=179MMML3 > > > Storage Element 2:Full :VolumeTag=658NNNL3 > > > Storage Element 3:Full :VolumeTag=756NNNL3 > > > Storage Element 4:Full :VolumeTag=177MMML3 > > > Storage Element 5:Full :VolumeTag=655NNNL3 > > > Storage Element 6:Full :VolumeTag=656NNNL3 > > > Storage Element 7:Full :VolumeTag=657NNNL3 > > > Storage Element 8:Full :VolumeTag=CLNU38L1 > > > > > > > > > Any ideas how to fix this? Other than restarting Bacula.. > > > > Could you stop all daemons with a sigsegv to force a backtrace ? > > killall -SEGV bacula-sd bacula-dir > > > > (you will find 2 kind of file, *traceback and *bactrace in working > > directory) > > > > After, if you can put results to pastbin, it will give information about > > your problem. > > Ok, problems again.. here are the tracebacks: > > http://pasik.reaktio.net/bacula/debug/bacula-sd-traceback.txt > http://pasik.reaktio.net/bacula/debug/bacula-dir-traceback.txt > > Here's what I did to make bacula-sd hang: > > 1. Rebooted the bacula server and the tape library > 2. Fresh after the reboot made sure mtx and bacula mtx-changer work OK. > 3. Started bacula > 4. Ran a job that copies jobs from disk pool to tape pool > 5. Bacula starts a bunch of jobs, but nothing happens.. bacula-sd is stuck. > > Any ideas how to debug this further?
Thanks for this traceback, it's very useful, i have found a problem in the code. in bool DCR::can_i_write_volume() we have : lock_read_volumes(); vol = find_read_volume(VolumeName); And the first step of find_read_volume() is to call lock_read_volumes(). And this lock is not recursive. Now, i will take a look. Bye > Atm I'm running Bacula 2.5.20 (svn rev 8083) on CentOS 5.2 x86 32bit. > > I also tried applying 2.4.3-sd-deadlock.patch (from bug #1192) but it > didn't seem to help. > > -- Pasi ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Bacula-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/bacula-devel
