On Thu, Nov 13, 2008 at 05:37:06PM +0100, Eric Bollengier wrote: > Hello, > > Le Thursday 13 November 2008 17:03:10 Pasi Kärkkäinen, vous avez écrit : > > Hello list! > > > > I'm using Bacula 2.5.19 and trying 'copy jobs' feature to copy jobs from > > disk volumes/pools to tape. > > > > Sometimes bacula-sd seems to get stuck.. it hangs without doing anything. > > Now it happened when tape got full and Bacula started to change the tape on > > the drive (using autoloader): > > > > bacula-sd JobId 3082: Start Copying JobId 3082, > > Job=CopyPool4UncopiedToTape.2008-11-13_10.53.04.54 bacula-sd JobId 3082: > > Using Device "IBM-LTO3-Drive" > > bacula-sd JobId 3082: Ready to read from volume "Pool4-Vol-0127" on device > > "FSDevice4" (/mnt/backup1/pool04). bacula-sd JobId 3082: Forward spacing > > Volume "Pool4-Vol-0127" to file:block 0:218. bacula-sd JobId 3082: End of > > Volume "756NNNL3" at 764:10067 on device "IBM-LTO3-Drive" (/dev/nst0). > > Write of 64512 bytes got -1. bacula-sd JobId 3082: Re-read of last block > > succeeded. > > bacula-sd JobId 3082: End of medium on Volume "756NNNL3" > > Bytes=725,237,130,240 Blocks=11,241,894 at 13-Nov-2008 11:51. bacula-sd > > JobId 3082: 3307 Issuing autochanger "unload slot 3, drive 0" command. > > > > <nothing happens after this> > > > > > > *sta > > Status available for: > > 1: Director > > 2: Storage > > 3: Client > > 4: All > > Select daemon type for status (1-4): 2 > > > > ... > > > > Device status: > > Autochanger "IBM-LTO3-AutoChanger" with devices: > > "IBM-LTO3-Drive" (/dev/nst0) > > Device "FSDevice0" (/mnt/backup1/pool00) is not open. > > Device "FSDevice1" (/mnt/backup1/pool01) is not open. > > Device "FSDevice2" (/mnt/backup1/pool02) is not open. > > Device "FSDevice3" (/mnt/backup1/pool03) is not open. > > Device "FSDevice4" (/mnt/backup1/pool04) is mounted with: > > Volume: Pool4-Vol-0127 > > Pool: Pool4 > > Media type: File4 > > Total Bytes Read=1,649,507,328 Blocks Read=25,569 Bytes/block=64,512 > > Positioned at File=0 Block=1,649,507,534 > > Device "IBM-LTO3-Drive" (/dev/nst0) is not open. > > Device is being initialized. > > Drive 0 is not loaded. > > ==== > > > > Used Volume status: > > > > <hangs here and nothing happens> > > > > > > I can exit bconsole by pressing CTRL+C multiple times.. if I restart > > bconsole and run that again, it gets stuck again.. > > > > I tried 'strace -p <pid>' to see what bacula-sd is doing: > > > > # strace -p 7339 > > Process 7339 attached - interrupt to quit > > select(5, [4], NULL, NULL, NULL <unfinished ...> > > Process 7339 detached > > > > So.. bacula-sd seems to be stuck on select() .. > > > > Running 'mtx' seems to work fine.. at the same time when bacula-sd is > > stuck. > > > > # mtx -f /dev/sg3 status > > Storage Changer /dev/sg3:1 Drives, 8 Slots ( 0 Import/Export ) > > Data Transfer Element 0:Empty > > Storage Element 1:Full :VolumeTag=179MMML3 > > Storage Element 2:Full :VolumeTag=658NNNL3 > > Storage Element 3:Full :VolumeTag=756NNNL3 > > Storage Element 4:Full :VolumeTag=177MMML3 > > Storage Element 5:Full :VolumeTag=655NNNL3 > > Storage Element 6:Full :VolumeTag=656NNNL3 > > Storage Element 7:Full :VolumeTag=657NNNL3 > > Storage Element 8:Full :VolumeTag=CLNU38L1 > > > > > > Any ideas how to fix this? Other than restarting Bacula.. > > Could you stop all daemons with a sigsegv to force a backtrace ? > killall -SEGV bacula-sd bacula-dir > > (you will find 2 kind of file, *traceback and *bactrace in working directory) > > After, if you can put results to pastbin, it will give information about your > problem. >
Ok, problems again.. here are the tracebacks: http://pasik.reaktio.net/bacula/debug/bacula-sd-traceback.txt http://pasik.reaktio.net/bacula/debug/bacula-dir-traceback.txt Here's what I did to make bacula-sd hang: 1. Rebooted the bacula server and the tape library 2. Fresh after the reboot made sure mtx and bacula mtx-changer work OK. 3. Started bacula 4. Ran a job that copies jobs from disk pool to tape pool 5. Bacula starts a bunch of jobs, but nothing happens.. bacula-sd is stuck. Any ideas how to debug this further? Atm I'm running Bacula 2.5.20 (svn rev 8083) on CentOS 5.2 x86 32bit. I also tried applying 2.4.3-sd-deadlock.patch (from bug #1192) but it didn't seem to help. -- Pasi ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Bacula-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/bacula-devel
