On 6/17/2015 10:35 PM, Andrew Noonan wrote: > @Marcin - dmesg is clean > > @Ana - I ~think~ /dev/changer is created by udev, I'm not 100%. It's > a symlink to sg19 in this case:
Most likely, udev got it right. Try mtx -f /dev/changer noattach status. If that works, then try the load with the 'noattach' prefix. This drive could be reporting it supports the _ATTACHED API, but does not exactly comply/match with the expected API commands. The noattach prefix will force mtx to use the regular media changer API. > > lrwxrwxrwx 1 root root 4 Jun 10 17:06 /dev/changer -> sg19 > crw-rw---- 1 root disk 21, 19 Jun 10 17:06 /dev/sg19 > > Here are the tape drive parts of "lsscsi -l". I removed a bunch of > HDDs attached to the system: > > [2:0:8:0] tape IBM ULT3580-HH6 D8E5 /dev/st0 > state=running queue_depth=254 scsi_level=7 type=1 device_blocked=0 > timeout=900 > [2:0:10:0] tape IBM ULT3580-HH6 D8E5 /dev/st1 > state=running queue_depth=254 scsi_level=7 type=1 device_blocked=0 > timeout=900 > [2:0:10:1] mediumx IBM 3573-TL C.30 - > state=running queue_depth=254 scsi_level=6 type=8 device_blocked=0 > timeout=0 > > At some point we had the Tape unit powered down and then powered it > back up, but the system remained on, so the dmesg mentions the drives > multiple times: > > [anoonan@odin ~]$ dmesg | grep Attached > sd 0:1:0:0: Attached scsi disk sda > scsi 0:0:0:0: Attached scsi generic sg0 type 0 > scsi 0:0:1:0: Attached scsi generic sg1 type 0 > sd 0:1:0:0: Attached scsi generic sg2 type 0 > sd 1:0:0:0: Attached scsi disk sdb > sd 1:0:0:0: Attached scsi generic sg3 type 0 > sd 1:0:0:1: Attached scsi disk sdc > sd 1:0:0:1: Attached scsi generic sg4 type 0 > sd 1:0:0:2: Attached scsi disk sdd > sd 1:0:0:2: Attached scsi generic sg5 type 0 > sd 1:0:0:3: Attached scsi disk sde > sd 1:0:0:3: Attached scsi generic sg6 type 0 > sd 1:0:0:31: Attached scsi disk sdf > sd 1:0:0:31: Attached scsi generic sg7 type 0 > sd 1:0:1:0: Attached scsi disk sdg > sd 1:0:1:0: Attached scsi generic sg8 type 0 > sd 1:0:1:1: Attached scsi disk sdh > sd 1:0:1:1: Attached scsi generic sg9 type 0 > sd 1:0:1:2: Attached scsi disk sdi > sd 1:0:1:2: Attached scsi generic sg10 type 0 > sd 1:0:1:3: Attached scsi disk sdj > sd 1:0:1:3: Attached scsi generic sg11 type 0 > sd 1:0:1:31: Attached scsi disk sdk > sd 1:0:1:31: Attached scsi generic sg12 type 0 > scsi 2:0:0:0: Attached scsi generic sg13 type 1 > scsi 2:0:1:0: Attached scsi generic sg14 type 1 > st 2:0:0:0: Attached scsi tape st0 > st 2:0:1:0: Attached scsi tape st1 > st 2:0:2:0: Attached scsi tape st1 > st 2:0:2:0: Attached scsi generic sg14 type 1 > st 2:0:3:0: Attached scsi tape st1 > st 2:0:3:0: Attached scsi generic sg14 type 1 > st 2:0:4:0: Attached scsi tape st0 > st 2:0:4:0: Attached scsi generic sg13 type 1 > st 2:0:5:0: Attached scsi tape st1 > st 2:0:5:0: Attached scsi generic sg14 type 1 > scsi 2:0:5:1: Attached scsi generic sg15 type 8 > st 2:0:6:0: Attached scsi tape st0 > st 2:0:6:0: Attached scsi generic sg13 type 1 > scsi 2:0:6:1: Attached scsi generic sg14 type 8 > st 2:0:7:0: Attached scsi tape st1 > st 2:0:7:0: Attached scsi generic sg15 type 1 > sd 1:0:0:4: Attached scsi disk sdl > sd 1:0:0:4: Attached scsi generic sg13 type 0 > sd 1:0:0:5: Attached scsi disk sdm > sd 1:0:0:5: Attached scsi generic sg14 type 0 > sd 1:0:1:4: Attached scsi disk sdn > sd 1:0:1:4: Attached scsi generic sg15 type 0 > sd 1:0:1:5: Attached scsi disk sdo > sd 1:0:1:5: Attached scsi generic sg16 type 0 > st 2:0:8:0: Attached scsi tape st0 > st 2:0:8:0: Attached scsi generic sg17 type 1 > st 2:0:9:0: Attached scsi tape st1 > st 2:0:9:0: Attached scsi generic sg18 type 1 > scsi 2:0:9:1: Attached scsi generic sg19 type 8 > st 2:0:10:0: Attached scsi tape st1 > st 2:0:10:0: Attached scsi generic sg18 type 1 > scsi 2:0:10:1: Attached scsi generic sg19 type 8 > > I did change out the device names for the drives to be /dev/tape/by-id > names instead to make sure the naming stays stable after reboots, but > I haven't tried changing /dev/changer to anything else. The btape > tests were successful, and I haven't had any problems with mtx or even > mt commands, though as mentioned previously, I've gotten Input/Output > errors from mt when doing rewind/weof commands to the drives. That > being said, I'm suspicious of the amount of time that mt reports > DR_OPEN on loads. I can issue mtx-changer commands OK, as well, > though I'm not sure if the "load" command is actually returning > correctly, or is just timing out internally. unloads/list/listall > from mtx-changer have always returned successfully. > > This is a Centos 5 machine, so not very new at all. mtx package is: > mtx-1.2.18-9, mt-st is mt-st-0.9b-4.el5 > > It may be a little challenge to upgrade this system (probably to > Centos 6), but not impossible if it needs to happen. > > Thanks, > Andrew > > On Wed, Jun 17, 2015 at 8:32 PM, Ana Emília M. Arruda > <emiliaarr...@gmail.com> wrote: >> Hello Andrew, >> >> Is /dev/changer created by udev rules? Have you tried /dev/sgX instead? Can >> you send us the output of the "lsscsi -l" command and "dmesg | grep >> Attached"? Have you checked your drives/autochanger using just mtx/mt >> commands to see if they are working? Which is your mtx version? >> >> Best regards, >> Ana >> >> On Wed, Jun 17, 2015 at 7:29 PM, Marcin Haba <ganius...@gmail.com> wrote: >>> Hello, >>> >>> Do you have any errors in dmesg (hardware errors, bus reset, SCSI >>> errors ... etc.) ? >>> >>> Best regards, >>> Marcin Haba (gani) >>> >>> 2015-06-17 21:56 GMT+02:00 Andrew Noonan <anoo...@gmail.com>: >>>> Hi all, >>>> >>>> It's taking a lot longer because of the higher timeouts, but the >>>> label is still failing with a termination. If I understand it >>>> correctly, the mtx-changer script is polling with 'mt' looking for the >>>> $ready state, defined in the config file as ONLINE (for Linux). I'm >>>> not seeing drive 0 go into that state... I just see: >>>> >>>> SCSI 2 tape drive: >>>> File number=-1, block number=-1, partition=0. >>>> Tape block size 0 bytes. Density code 0x0 (default). >>>> Soft error count since last status=0 >>>> General status bits on (50000): >>>> DR_OPEN IM_REP_EN >>>> >>>> the other device looks like: >>>> >>>> SCSI 2 tape drive: >>>> File number=0, block number=0, partition=0. >>>> Tape block size 0 bytes. Density code 0x5a (no translation). >>>> Soft error count since last status=0 >>>> General status bits on (41010000): >>>> BOT ONLINE IM_REP_EN >>>> >>>> So I see that it's ~possible~ to see the ONLINE state, but it doesn't >>>> seem like it ever gets to that state during load. >>>> >>>> Any thoughts? >>>> >>>> Thanks, >>>> Andrew >>>> >>>> On Wed, Jun 17, 2015 at 11:44 AM, Andrew Noonan <anoo...@gmail.com> >>>> wrote: >>>>> Hi Ana, >>>>> >>>>> Thanks for the reply. I'm adding those into the drives. BTW, >>>>> 900 is the value. Having no real experience with these, is it >>>>> abnormal for a load to take the 10+ minutes, or is that reasonable? >>>>> My next step is to add those settings in, restart the SD, and attempt >>>>> to do a "label barcode" again. >>>>> >>>>> Thanks, >>>>> Andrew >>>>> >>>>> On Tue, Jun 16, 2015 at 9:10 PM, Ana Emília M. Arruda >>>>> <emiliaarr...@gmail.com> wrote: >>>>>> Hello Andrew, >>>>>> >>>>>> You can find in the output of a "lsscsi -l" command the timeout for >>>>>> your >>>>>> drives. Then you can configure 3 timeout directives for each one of >>>>>> your two >>>>>> drives (LRADrive-1 e LRADrive-2): >>>>>> >>>>>> Maximum Changer Wait = X >>>>>> Maximum Rewind Wait = X >>>>>> Maximum Open Wait = X >>>>>> >>>>>> where X is the timeout value for your dirves. >>>>>> >>>>>> You can also customize your mtx-changer script for this timeout >>>>>> changing the >>>>>> bellow 300 seconds value: >>>>>> >>>>>> wait_for_drive() { >>>>>> i=0 >>>>>> while [ $i -le 300 ]; do # Wait max 300 seconds >>>>>> >>>>>> Best regards, >>>>>> Ana >>>>>> >>>>>> >>>>>> On Tue, Jun 16, 2015 at 5:02 PM, Andrew Noonan <anoo...@gmail.com> >>>>>> wrote: >>>>>>> Hi all, >>>>>>> >>>>>>> I'm almost completely new to tape. We've been doing disk-based >>>>>>> backups for years, but we now have a project where we want to offsite >>>>>>> hundreds of TB permanently, and have a Dell TL4000 (a rebranded IBM >>>>>>> 3573-TL from the looks of it) with 2 ULT3580 LTO-6 drives. We're >>>>>>> running bacula 5.2. The server is a Dell 1950 running Centos 5 >>>>>>> (sorry >>>>>>> for the old OS). >>>>>>> >>>>>>> The btape tests run on both units without a problem, including the >>>>>>> autochanger tests, and manually executing load/unload/list commands >>>>>>> with mtx-changer seem to run fine. The one exception to this is that >>>>>>> the mtx-changer load command seems to take about 10 minutes to >>>>>>> complete, which seems unreasonably long. These are brand new tapes >>>>>>> and I haven't written anything to them other then whatever btape does >>>>>>> with testing. I put a 5 minute sleep on the load for mtx-changer, >>>>>>> but >>>>>>> other then that haven't customized the script, as I'm not sure what >>>>>>> I'd customize. >>>>>>> >>>>>>> The "update slots" command from the director works OK, but when I go >>>>>>> to do a "label barcode", the resulting "load slot" gets killed by >>>>>>> Bacula: >>>>>>> >>>>>>> 3992 Bad autochanger "load slot 20, drive 1": ERR=Child died from >>>>>>> signal 15: Termination. >>>>>>> Results=Program killed by Bacula (timeout) >>>>>>> >>>>>>> I've seen that in some of these posts to the list, this ends up being >>>>>>> permissions problems against the devices, but that doesn't seem to be >>>>>>> the case as far as I can see: >>>>>>> >>>>>>> bacula-sd is running as the bacula user/group. The bacula user is in >>>>>>> the "disk" group, and the *st* devices are in the disk group with >>>>>>> "rw" >>>>>>> permissions: >>>>>>> >>>>>>> crw-rw---- 1 root disk 9, 128 Jun 4 12:02 /dev/nst0 >>>>>>> crw-rw---- 1 root disk 9, 224 Jun 4 12:02 /dev/nst0a >>>>>>> crw-rw---- 1 root disk 9, 160 Jun 4 12:02 /dev/nst0l >>>>>>> crw-rw---- 1 root disk 9, 192 Jun 4 12:02 /dev/nst0m >>>>>>> crw-rw---- 1 root disk 9, 129 Jun 10 17:06 /dev/nst1 >>>>>>> crw-rw---- 1 root disk 9, 225 Jun 10 17:06 /dev/nst1a >>>>>>> crw-rw---- 1 root disk 9, 161 Jun 10 17:06 /dev/nst1l >>>>>>> crw-rw---- 1 root disk 9, 193 Jun 10 17:06 /dev/nst1m >>>>>>> crw-rw---- 1 root disk 9, 0 Jun 4 12:02 /dev/st0 >>>>>>> crw-rw---- 1 root disk 9, 96 Jun 4 12:02 /dev/st0a >>>>>>> crw-rw---- 1 root disk 9, 32 Jun 4 12:02 /dev/st0l >>>>>>> crw-rw---- 1 root disk 9, 64 Jun 4 12:02 /dev/st0m >>>>>>> crw-rw---- 1 root disk 9, 1 Jun 10 17:06 /dev/st1 >>>>>>> crw-rw---- 1 root disk 9, 97 Jun 10 17:06 /dev/st1a >>>>>>> crw-rw---- 1 root disk 9, 33 Jun 10 17:06 /dev/st1l >>>>>>> crw-rw---- 1 root disk 9, 65 Jun 10 17:06 /dev/st1m >>>>>>> >>>>>>> Here's a block of debug from the SD during a label attempt for one of >>>>>>> the >>>>>>> slots: >>>>>>> >>>>>>> odin-sd: autochanger.c:434-0 Wiffle through devices looking for slot >>>>>>> odin-sd: autochanger.c:313-0 Locking changer LogRepoAutochanger >>>>>>> odin-sd: autochanger.c:740-0 omsg=/usr/lib64/bacula/mtx-changer >>>>>>> /dev/changer loaded 14 /dev/nst0 0 >>>>>>> odin-sd: autochanger.c:272-0 Run >>>>>>> program=/usr/lib64/bacula/mtx-changer >>>>>>> /dev/changer loaded 14 /dev/nst0 0 >>>>>>> odin-sd: watchdog.c:206-0 Registered watchdog 636b888, interval 300 >>>>>>> odin-sd: bpipe.c:220-0 Wait for 28962 opt=1 >>>>>>> odin-sd: bpipe.c:228-0 Got break wpid=28962 status=0 ERR=none >>>>>>> odin-sd: bpipe.c:249-0 child status=0 >>>>>>> odin-sd: watchdog.c:226-0 Unregistered watchdog 636b888 >>>>>>> odin-sd: bpipe.c:264-0 returning stat=0,0 >>>>>>> odin-sd: autochanger.c:274-0 run_prog: /usr/lib64/bacula/mtx-changer >>>>>>> /dev/changer loaded 14 /dev/nst0 0 stat=0 result=0 >>>>>>> odin-sd: autochanger.c:327-0 Unlocking changer LogRepoAutochanger >>>>>>> odin-sd: autochanger.c:313-0 Locking changer LogRepoAutochanger >>>>>>> odin-sd: autochanger.c:740-0 omsg=/usr/lib64/bacula/mtx-changer >>>>>>> /dev/changer loaded 14 /dev/nst1 1 >>>>>>> odin-sd: autochanger.c:272-0 Run >>>>>>> program=/usr/lib64/bacula/mtx-changer >>>>>>> /dev/changer loaded 14 /dev/nst1 1 >>>>>>> odin-sd: watchdog.c:206-0 Registered watchdog 636b888, interval 300 >>>>>>> odin-sd: bpipe.c:220-0 Wait for 28976 opt=1 >>>>>>> odin-sd: bpipe.c:228-0 Got break wpid=28976 status=0 ERR=none >>>>>>> odin-sd: bpipe.c:249-0 child status=0 >>>>>>> odin-sd: watchdog.c:226-0 Unregistered watchdog 636b888 >>>>>>> odin-sd: bpipe.c:264-0 returning stat=0,0 >>>>>>> odin-sd: autochanger.c:274-0 run_prog: /usr/lib64/bacula/mtx-changer >>>>>>> /dev/changer loaded 14 /dev/nst1 1 stat=0 result=0 >>>>>>> odin-sd: autochanger.c:327-0 Unlocking changer LogRepoAutochanger >>>>>>> odin-sd: autochanger.c:453-0 Slot=14 not found in another device >>>>>>> odin-sd: autochanger.c:313-0 Locking changer LogRepoAutochanger >>>>>>> odin-sd: autochanger.c:183-0 Doing changer load slot 14 "LRADrive-2" >>>>>>> (/dev/nst1) >>>>>>> odin-sd: autochanger.c:740-0 omsg=/usr/lib64/bacula/mtx-changer >>>>>>> /dev/changer load 14 /dev/nst1 1 >>>>>>> odin-sd: dev.c:1746-0 close_dev "LRADrive-2" (/dev/nst1) >>>>>>> odin-sd: dev.c:1751-0 device "LRADrive-2" (/dev/nst1) already closed >>>>>>> vol= >>>>>>> odin-sd: autochanger.c:190-0 Run >>>>>>> program=/usr/lib64/bacula/mtx-changer >>>>>>> /dev/changer load 14 /dev/nst1 1 >>>>>>> odin-sd: watchdog.c:206-0 Registered watchdog 636b888, interval 300 >>>>>>> odin-sd: bpipe.c:443-0 Run program fgets killed=1 >>>>>>> odin-sd: bpipe.c:220-0 Wait for 28990 opt=1 >>>>>>> odin-sd: bpipe.c:228-0 Got break wpid=28990 status=15 ERR=none >>>>>>> odin-sd: bpipe.c:256-0 Child died from signal 15 >>>>>>> odin-sd: watchdog.c:235-0 Unregistered inactive watchdog 636b888 >>>>>>> odin-sd: bpipe.c:264-0 returning stat=15,134217743 >>>>>>> odin-sd: autochanger.c:205-0 load slot 14, drive 1, bad stats=Child >>>>>>> died from signal 15: Termination. >>>>>>> odin-sd: autochanger.c:212-0 load slot 14 status=134217743 >>>>>>> odin-sd: autochanger.c:327-0 Unlocking changer LogRepoAutochanger >>>>>>> odin-sd: autochanger.c:218-0 After changer, status=134217743 >>>>>>> odin-sd: dev.c:1735-0 Clear volhdr vol= >>>>>>> odin-sd: vol_mgr.c:544-0 vol_unused: no vol on "LRADrive-2" >>>>>>> (/dev/nst1) >>>>>>> odin-sd: lock.c:302-0 return lock. old=BST_WRITING_LABEL from >>>>>>> dircmd.c:554 >>>>>>> odin-sd: lock.c:307-0 return lock. new=BST_NOT_BLOCKED >>>>>>> odin-sd: dev.c:1746-0 close_dev "LRADrive-2" (/dev/nst1) >>>>>>> odin-sd: dev.c:1751-0 device "LRADrive-2" (/dev/nst1) already closed >>>>>>> vol= >>>>>>> odin-sd: acquire.c:731-0 Enter detach_dcr_from_dev >>>>>>> odin-sd: dircmd.c:220-0 <dird: label LogRepoAutochanger >>>>>>> VolumeName=000030L6 PoolName=LogrepoArchive MediaType=LTO-6 Slot=15 >>>>>>> drive=1 >>>>>>> odin-sd: dircmd.c:234-0 Do command: label >>>>>>> odin-sd: dircmd.c:627-0 Try changer device LRADrive-1 >>>>>>> odin-sd: dircmd.c:648-0 Device LogRepoAutochanger drive wrong: want=1 >>>>>>> got=0 skipping >>>>>>> odin-sd: dircmd.c:627-0 Try changer device LRADrive-2 >>>>>>> odin-sd: dircmd.c:643-0 Found changer device LRADrive-2 >>>>>>> odin-sd: dircmd.c:656-0 Found device LRADrive-2 >>>>>>> odin-sd: block.c:144-0 Returning new block=636b800 >>>>>>> odin-sd: acquire.c:713-0 JobId=0 enter attach_dcr_to_dev >>>>>>> odin-sd: dircmd.c:421-0 Can label. Device is not open >>>>>>> odin-sd: lock.c:285-0 steal lock. old=BST_NOT_BLOCKED from >>>>>>> dircmd.c:470 >>>>>>> odin-sd: lock.c:290-0 steal lock. new=BST_WRITING_LABEL >>>>>>> odin-sd: dircmd.c:471-0 Stole device "LRADrive-2" (/dev/nst1) lock, >>>>>>> writing label. >>>>>>> >>>>>>> The config I've got for these is: >>>>>>> >>>>>>> Device { >>>>>>> Name = LRADrive-1 >>>>>>> Alert Command = "sh -c 'smartctl -H -l error %c'" >>>>>>> AlwaysOpen = yes >>>>>>> ArchiveDevice = /dev/nst0 >>>>>>> AutoChanger = yes >>>>>>> AutomaticMount = yes >>>>>>> DeviceType = Tape >>>>>>> DriveIndex = 0 >>>>>>> LabelMedia = no >>>>>>> MediaType = LTO-6 >>>>>>> RandomAccess = no >>>>>>> RemovableMedia = yes >>>>>>> } >>>>>>> >>>>>>> Device { >>>>>>> Name = LRADrive-2 >>>>>>> Alert Command = "sh -c 'smartctl -H -l error %c'" >>>>>>> AlwaysOpen = yes >>>>>>> ArchiveDevice = /dev/nst1 >>>>>>> AutoChanger = yes >>>>>>> AutomaticMount = yes >>>>>>> DeviceType = Tape >>>>>>> DriveIndex = 1 >>>>>>> LabelMedia = no >>>>>>> MediaType = LTO-6 >>>>>>> RandomAccess = no >>>>>>> RemovableMedia = yes >>>>>>> } >>>>>>> >>>>>>> Autochanger { >>>>>>> Name = LogRepoAutochanger >>>>>>> ChangerCommand = "/usr/lib64/bacula/mtx-changer %c %o %S %a %d" >>>>>>> ChangerDevice = /dev/changer >>>>>>> Device = LRADrive-1 >>>>>>> Device = LRADrive-2 >>>>>>> } >>>>>>> >>>>>>> >>>>>>> I know there are some things that could be optimized here for >>>>>>> performance, and I'm certainly interested in them, but right now I >>>>>>> can't even label my tapes :) >>>>>>> >>>>>>> I suspect it's the long load delay, and I wasn't sure if maybe the >>>>>>> drive is searching for some mark or something. On that note, I tried >>>>>>> to do a "rewind" and "weof" using the /dev/st0 device (wasn't sure if >>>>>>> nst0 would complain about issuing a rewind), but I would get >>>>>>> "Input/Output error" messages from mt on both the rewind and weof >>>>>>> commands. >>>>>>> >>>>>>> Any advice I could get would be helpful. >>>>>>> >>>>>>> Thanks! >>>>>>> Andrew >>>>>>> >>>>>>> >>>>>>> >>>>>>> ------------------------------------------------------------------------------ >>>>>>> _______________________________________________ >>>>>>> Bacula-users mailing list >>>>>>> Bacula-users@lists.sourceforge.net >>>>>>> https://lists.sourceforge.net/lists/listinfo/bacula-users >>>>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> _______________________________________________ >>>> Bacula-users mailing list >>>> Bacula-users@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/bacula-users >>> >>> >>> -- >>> "Większej miłości nikt nie ma nad tę, jak gdy kto życie swoje kładzie >>> za przyjaciół swoich." Jezus Chrystus >>> >>> >>> ------------------------------------------------------------------------------ >>> _______________________________________________ >>> Bacula-users mailing list >>> Bacula-users@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/bacula-users >> > ------------------------------------------------------------------------------ > _______________________________________________ > Bacula-users mailing list > Bacula-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/bacula-users ------------------------------------------------------------------------------ _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users