Hi all, I'm almost completely new to tape. We've been doing disk-based backups for years, but we now have a project where we want to offsite hundreds of TB permanently, and have a Dell TL4000 (a rebranded IBM 3573-TL from the looks of it) with 2 ULT3580 LTO-6 drives. We're running bacula 5.2. The server is a Dell 1950 running Centos 5 (sorry for the old OS).
The btape tests run on both units without a problem, including the autochanger tests, and manually executing load/unload/list commands with mtx-changer seem to run fine. The one exception to this is that the mtx-changer load command seems to take about 10 minutes to complete, which seems unreasonably long. These are brand new tapes and I haven't written anything to them other then whatever btape does with testing. I put a 5 minute sleep on the load for mtx-changer, but other then that haven't customized the script, as I'm not sure what I'd customize. The "update slots" command from the director works OK, but when I go to do a "label barcode", the resulting "load slot" gets killed by Bacula: 3992 Bad autochanger "load slot 20, drive 1": ERR=Child died from signal 15: Termination. Results=Program killed by Bacula (timeout) I've seen that in some of these posts to the list, this ends up being permissions problems against the devices, but that doesn't seem to be the case as far as I can see: bacula-sd is running as the bacula user/group. The bacula user is in the "disk" group, and the *st* devices are in the disk group with "rw" permissions: crw-rw---- 1 root disk 9, 128 Jun 4 12:02 /dev/nst0 crw-rw---- 1 root disk 9, 224 Jun 4 12:02 /dev/nst0a crw-rw---- 1 root disk 9, 160 Jun 4 12:02 /dev/nst0l crw-rw---- 1 root disk 9, 192 Jun 4 12:02 /dev/nst0m crw-rw---- 1 root disk 9, 129 Jun 10 17:06 /dev/nst1 crw-rw---- 1 root disk 9, 225 Jun 10 17:06 /dev/nst1a crw-rw---- 1 root disk 9, 161 Jun 10 17:06 /dev/nst1l crw-rw---- 1 root disk 9, 193 Jun 10 17:06 /dev/nst1m crw-rw---- 1 root disk 9, 0 Jun 4 12:02 /dev/st0 crw-rw---- 1 root disk 9, 96 Jun 4 12:02 /dev/st0a crw-rw---- 1 root disk 9, 32 Jun 4 12:02 /dev/st0l crw-rw---- 1 root disk 9, 64 Jun 4 12:02 /dev/st0m crw-rw---- 1 root disk 9, 1 Jun 10 17:06 /dev/st1 crw-rw---- 1 root disk 9, 97 Jun 10 17:06 /dev/st1a crw-rw---- 1 root disk 9, 33 Jun 10 17:06 /dev/st1l crw-rw---- 1 root disk 9, 65 Jun 10 17:06 /dev/st1m Here's a block of debug from the SD during a label attempt for one of the slots: odin-sd: autochanger.c:434-0 Wiffle through devices looking for slot odin-sd: autochanger.c:313-0 Locking changer LogRepoAutochanger odin-sd: autochanger.c:740-0 omsg=/usr/lib64/bacula/mtx-changer /dev/changer loaded 14 /dev/nst0 0 odin-sd: autochanger.c:272-0 Run program=/usr/lib64/bacula/mtx-changer /dev/changer loaded 14 /dev/nst0 0 odin-sd: watchdog.c:206-0 Registered watchdog 636b888, interval 300 odin-sd: bpipe.c:220-0 Wait for 28962 opt=1 odin-sd: bpipe.c:228-0 Got break wpid=28962 status=0 ERR=none odin-sd: bpipe.c:249-0 child status=0 odin-sd: watchdog.c:226-0 Unregistered watchdog 636b888 odin-sd: bpipe.c:264-0 returning stat=0,0 odin-sd: autochanger.c:274-0 run_prog: /usr/lib64/bacula/mtx-changer /dev/changer loaded 14 /dev/nst0 0 stat=0 result=0 odin-sd: autochanger.c:327-0 Unlocking changer LogRepoAutochanger odin-sd: autochanger.c:313-0 Locking changer LogRepoAutochanger odin-sd: autochanger.c:740-0 omsg=/usr/lib64/bacula/mtx-changer /dev/changer loaded 14 /dev/nst1 1 odin-sd: autochanger.c:272-0 Run program=/usr/lib64/bacula/mtx-changer /dev/changer loaded 14 /dev/nst1 1 odin-sd: watchdog.c:206-0 Registered watchdog 636b888, interval 300 odin-sd: bpipe.c:220-0 Wait for 28976 opt=1 odin-sd: bpipe.c:228-0 Got break wpid=28976 status=0 ERR=none odin-sd: bpipe.c:249-0 child status=0 odin-sd: watchdog.c:226-0 Unregistered watchdog 636b888 odin-sd: bpipe.c:264-0 returning stat=0,0 odin-sd: autochanger.c:274-0 run_prog: /usr/lib64/bacula/mtx-changer /dev/changer loaded 14 /dev/nst1 1 stat=0 result=0 odin-sd: autochanger.c:327-0 Unlocking changer LogRepoAutochanger odin-sd: autochanger.c:453-0 Slot=14 not found in another device odin-sd: autochanger.c:313-0 Locking changer LogRepoAutochanger odin-sd: autochanger.c:183-0 Doing changer load slot 14 "LRADrive-2" (/dev/nst1) odin-sd: autochanger.c:740-0 omsg=/usr/lib64/bacula/mtx-changer /dev/changer load 14 /dev/nst1 1 odin-sd: dev.c:1746-0 close_dev "LRADrive-2" (/dev/nst1) odin-sd: dev.c:1751-0 device "LRADrive-2" (/dev/nst1) already closed vol= odin-sd: autochanger.c:190-0 Run program=/usr/lib64/bacula/mtx-changer /dev/changer load 14 /dev/nst1 1 odin-sd: watchdog.c:206-0 Registered watchdog 636b888, interval 300 odin-sd: bpipe.c:443-0 Run program fgets killed=1 odin-sd: bpipe.c:220-0 Wait for 28990 opt=1 odin-sd: bpipe.c:228-0 Got break wpid=28990 status=15 ERR=none odin-sd: bpipe.c:256-0 Child died from signal 15 odin-sd: watchdog.c:235-0 Unregistered inactive watchdog 636b888 odin-sd: bpipe.c:264-0 returning stat=15,134217743 odin-sd: autochanger.c:205-0 load slot 14, drive 1, bad stats=Child died from signal 15: Termination. odin-sd: autochanger.c:212-0 load slot 14 status=134217743 odin-sd: autochanger.c:327-0 Unlocking changer LogRepoAutochanger odin-sd: autochanger.c:218-0 After changer, status=134217743 odin-sd: dev.c:1735-0 Clear volhdr vol= odin-sd: vol_mgr.c:544-0 vol_unused: no vol on "LRADrive-2" (/dev/nst1) odin-sd: lock.c:302-0 return lock. old=BST_WRITING_LABEL from dircmd.c:554 odin-sd: lock.c:307-0 return lock. new=BST_NOT_BLOCKED odin-sd: dev.c:1746-0 close_dev "LRADrive-2" (/dev/nst1) odin-sd: dev.c:1751-0 device "LRADrive-2" (/dev/nst1) already closed vol= odin-sd: acquire.c:731-0 Enter detach_dcr_from_dev odin-sd: dircmd.c:220-0 <dird: label LogRepoAutochanger VolumeName=000030L6 PoolName=LogrepoArchive MediaType=LTO-6 Slot=15 drive=1 odin-sd: dircmd.c:234-0 Do command: label odin-sd: dircmd.c:627-0 Try changer device LRADrive-1 odin-sd: dircmd.c:648-0 Device LogRepoAutochanger drive wrong: want=1 got=0 skipping odin-sd: dircmd.c:627-0 Try changer device LRADrive-2 odin-sd: dircmd.c:643-0 Found changer device LRADrive-2 odin-sd: dircmd.c:656-0 Found device LRADrive-2 odin-sd: block.c:144-0 Returning new block=636b800 odin-sd: acquire.c:713-0 JobId=0 enter attach_dcr_to_dev odin-sd: dircmd.c:421-0 Can label. Device is not open odin-sd: lock.c:285-0 steal lock. old=BST_NOT_BLOCKED from dircmd.c:470 odin-sd: lock.c:290-0 steal lock. new=BST_WRITING_LABEL odin-sd: dircmd.c:471-0 Stole device "LRADrive-2" (/dev/nst1) lock, writing label. The config I've got for these is: Device { Name = LRADrive-1 Alert Command = "sh -c 'smartctl -H -l error %c'" AlwaysOpen = yes ArchiveDevice = /dev/nst0 AutoChanger = yes AutomaticMount = yes DeviceType = Tape DriveIndex = 0 LabelMedia = no MediaType = LTO-6 RandomAccess = no RemovableMedia = yes } Device { Name = LRADrive-2 Alert Command = "sh -c 'smartctl -H -l error %c'" AlwaysOpen = yes ArchiveDevice = /dev/nst1 AutoChanger = yes AutomaticMount = yes DeviceType = Tape DriveIndex = 1 LabelMedia = no MediaType = LTO-6 RandomAccess = no RemovableMedia = yes } Autochanger { Name = LogRepoAutochanger ChangerCommand = "/usr/lib64/bacula/mtx-changer %c %o %S %a %d" ChangerDevice = /dev/changer Device = LRADrive-1 Device = LRADrive-2 } I know there are some things that could be optimized here for performance, and I'm certainly interested in them, but right now I can't even label my tapes :) I suspect it's the long load delay, and I wasn't sure if maybe the drive is searching for some mark or something. On that note, I tried to do a "rewind" and "weof" using the /dev/st0 device (wasn't sure if nst0 would complain about issuing a rewind), but I would get "Input/Output error" messages from mt on both the rewind and weof commands. Any advice I could get would be helpful. Thanks! Andrew ------------------------------------------------------------------------------ _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users