On 25 Aug 2007 at 16:33, Ivan Adzhubey wrote:

> Hi Dan,
> 
> On Saturday 25 August 2007 03:41:55 pm Dan Langille wrote:
> > On 25 Aug 2007 at 1:32, Ivan Adzhubey wrote:
> > > Hi,
> > >
> > > I am getting the following errors while running large jobs with data
> > > spanning
> > >
> > > >2 tapes:
> > >
> > > 23-Aug 21:37 rosalind-sd: 3305 Autochanger "load slot 7, drive 0", status
> > > is OK.
> > > 23-Aug 21:37 rosalind-sd: 3301 Issuing autochanger "loaded drive 0"
> > > command. 23-Aug 21:37 rosalind-sd: 3302 Autochanger "loaded drive 0",
> > > result is Slot 7. 23-Aug 21:38 rosalind-sd: Recycled volume
> > > "Chromosome0031" on device "Drive-1" (/dev/nst0), all previous data lost.
> > > 23-Aug 21:38 rosalind-sd: New volume "Chromosome0031" mounted on
> > > device "Drive-1" (/dev/nst0) at 23-Aug-2007 21:38.
> > > 23-Aug 21:38 rosalind-sd: fantom-dataOut.2007-08-22_23.10.12 Error:
> > > block.c:538 Write error at 0:1756 on device "Drive-1" (/dev/nst0).
> > > ERR=Input/output error.
> >
> > Try adding a sleep to the changer script.  Sometimes the tape drive
> > is still settling when the write is attempted.
> 
> I did, this section of my mtx-changer script looks like this:
> 
> case $cmd in
>    unload)
>       debug "Doing mtx -f $ctl unload $slot $drive"
> #
> # enable the following line if you need to eject the cartridge
>       mt -f $device offline
>       sleep 10
>       ${MTX} -f $ctl unload $slot $drive
>       ;;
> 
>    load)
>       debug "Doing mtx -f $ctl load $slot $drive"
>       ${MTX} -f $ctl load $slot $drive
>       rtn=$?
> #
> # Increase the sleep time if you have a slow device
> # or remove the sleep and add the following:
> #     sleep 15
>       wait_for_drive $device
>       exit $rtn
>       ;;
> 
> As you can see, I do have "sleep 10" after offline and "wait_for_drive" after 
> load. I used to have sleep 15 after load instead, and it worked the same. All 
> autochanger tests pass without a problem. Any suggestions where to insert 
> more delays in the script?

Try longer delays rather than more delays.  e.g. 60 seconds.

> 
> > > 23-Aug 21:38 rosalind-sd: fantom-dataOut.2007-08-22_23.10.12 Error: Error
> > > writing final EOF to tape. This Volume may not be readable.
> > > dev.c:1542 ioctl MTWEOF error on "Drive-1" (/dev/nst0). ERR=Input/output
> > > error.
> > > 23-Aug 21:38 rosalind-sd: End of medium on Volume "Chromosome0031"
> > > Bytes=113,218,556 Blocks=1,755 at 23-Aug-2007 21:38.
> > >
> > > This started with version 1.36.1 that we were running originally and
> > > persisted through upgrade to 1.38.11. I have built and installed version
> > > 2.2.0 today but haven't run large backups yet. I am trying to test and
> > > eliminate any possible hardware/driver configuration problems first.
> > > Regular btape "test" and "auto" tests completed perfectly, now I want to
> > > run "fill" test but documentation states multiple-tape variant is still
> > > not operational. Is it true?
> >
> > Multi-volume backups has and still is a vital feature of Bacula.  See
> > http://www.bacula.org/rel-manual/Current_State_Bacula.html:
> >
> > "Multi-volume saves. When a Volume is full, Bacula automatically
> > requests the next Volume and continues the backup."
> >
> > Granted, it could be worded better... :)
> >
> > Where did you see otherwise?  We should amend that.
> 
> It's not in the main documentation but in the "Testing Your Tape..." chapter. 
> It only refers to "fill" command as implemented in btape:
> 
> http://www.bacula.org/rel-manual/Testing_Your_Tape_Drive.html#TapeTestingChapter
> 
> "Using btape to Simulate Filling a Tape
> 
> <...skipped...>
> 
> To begin this test, you enter the fill command and follow the instructions. 
> There are two options: the simple single tape option and the multiple tape 
> option. Please use only the simple single tape option because the multiple 
> tape option still doesn't work totally correctly. If the single tape option 
> does not succeed, you should correct the problem before using Bacula."
> 
> > > Does it mean version 2.2.0 in general can still have problems with
> > > multi-volume backups?
> >
> > It should not, but anything is possible.
> 
> Bacula used to run multi-volume jobs here just fine for years until very 
> recently. Still, it is only 3 or 4-volume jobs that constantly fail, 2-volume 
> ones are OK. It also looks like as soon as the first error occurres, bacula 
> loses track of files/volumes completely and all consequent attempts to change 
> a volume in the middle of the job will fail. I've lost most of my previous 
> backup data through this error already. This however was my mistake so I 
> can't complain: I was keeping purged tapes with old data in the changer 
> unprotected and the overnight backup triggering this error has just recycled 
> and trashed them all. Since every attempt to change a volume was failing, 
> bacula just kept recycling volumes until none were left, before I noticed. It 
> never happened before, so I grew sort of overconfident in it ;-(
> 
> > > The server is running Linux kernel 2.4.18smp and have Qualstar RLS-4445
> > > autochanger with single Sony SDX-700C AIT drive attached to Adaptec 3960D
> > > Ultra160 SCSI adapter (aic7xxx driver). SCSI RAID is also attached to the
> > > first channel of the same dual-host adapter. I read that sharing SCSI
> > > adapters with other devices may create problems but it was running just
> > > fine for 4 years in this configuration until recently when the amount of
> > > backup data increased. The problem seems to only appear when a single
> > > backup job spans 3 or more volumes; spanning 2 volumes has yet to produce
> > > an error, although I haven't run too many of large jobs - they take a lot
> > > of time. But every 2+ volume job I've tried has failed.
> > >
> > > Here's the tapeinfo output:
> > >
> > > # tapeinfo -f /dev/sg2
> > > Product Type: Tape Drive
> > > Vendor ID: 'SONY    '
> > > Product ID: 'SDX-700C        '
> > > Revision: '0103'
> > > Attached Changer: No
> > > SerialNumber: '0002084649'
> > > MinBlock:2
> > > MaxBlock:16777215
> > > SCSI ID: 1
> > > SCSI LUN: 0
> > > Ready: yes
> > > BufferedMode: yes
> > > Medium Type: Not Loaded
> > > Density Code: 0x32
> > > BlockSize: 0
> > > DataCompEnabled: yes
> > > DataCompCapable: yes
> > > DataDeCompEnabled: yes
> > > CompType: 0x3
> > > DeCompType: 0x0
> > > BOP: yes
> > > Block Position: 0
> >
> > Here is my two tape test with btape.  Does your test run OK?
> >
> >   http://www.freebsddiary.org/digital-tl891.php
> >
> > Look for: ape -c /usr/local/etc/bacula-sd.conf /dev/nsa0
> 
> Yep, ran it many times, tested half of my tapes already, including the ones 
> that failed during actual backups. Not a single glitch. Ran single-volume 
> btape "fill" test already, no errors either. Running a large test backup 
> right now, should take about 12 hours to complete. If it fails, I will 
> consider disabling Hardware End of Medium as per your suggestions. Thanks for 
> a link, very useful!

That suggestion may be OS specific.


-- 
Dan Langille - http://www.langille.org/
Available for hire: http://www.freebsddiary.org/dan_langille.php



-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
Bacula-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Reply via email to