I've had a problem writing tapes on my amanda tape server. While I have
a work-around, I thought I'd mention it here to see if anyone has any
ideas how to solve it.  

Problem Summary:

amdump runs on the tape server without reporting any errors. amverify
and amrestore will not read the tape. On further examination, the tape
header is OK; the FIRST file header is bad; the rest of the file
headers, and probably the rest of the contents of the tape, are OK.

Configuration on which it occurs:

Redhat linux 7.0 with linux kernel 2.4.0-test12 compiled from source
        (it appears NOT to happen with kernel 2.2.16 nor 2.2.18) 
amanda 2.4.2
pentium III clone/adaptec 2940 PCI card/internal VXA-1 drive

Details:

Double-check tape before running amanda - amverify reports no errors.
Rewind the tape. Run amcheck. 

amdump runs according to schedule. It sends me the usual mail report. 

Try running amverify. It fails with an error as follows:

> [amanda@darcy amanda]$ amverify backupset
> No tape changer...
> Tape device is /dev/nst0...
> Verify summary to amanda karelsf
> Defects file is /tmp/amanda/amverify.5097/defects
> amverify backupset
> Sat Dec 30 06:57:13 EST 2000
> 
> Using device /dev/nst0
> Waiting for device to go ready...
> Rewinding...
> Processing label...
> Volume backupset1, Date 20001229
> Rewinding...
> ** Error detected ()
> amrestore:   0: skipping start of tape: date 20001229 label backupset1
> amrestore: error reading file header: Input/output error
> ** No header
> 0+0 records in
> 0+0 records out
> 
> ** Error detected ()
> amrestore: error reading file header: Input/output error
> ** No header
> 0+0 records in
> 0+0 records out
> ** Error detected ()
> amrestore: error reading file header: Input/output error
> ** No header
> 0+0 records in
> 0+0 records out
> aborted!
> aborted!

On further examination, I can read the tape header and all EXCEPT the
first file header.

> [amanda@darcy amanda]$ mt -f /dev/nst0 rewind
> [amanda@darcy amanda]$ dd if=/dev/nst0 bs=32k count=1
> AMANDA: TAPESTART DATE 20001229 TAPE backupset1
> 
> 1+0 records in
> 1+0 records out
> [amanda@darcy amanda]$ mt -f /dev/nst0 fsf   
> [amanda@darcy amanda]$ dd if=/dev/nst0 bs=32k count=1
> dd: reading `/dev/nst0': Input/output error
> 0+0 records in
> 0+0 records out
> [amanda@darcy amanda]$ mt -f /dev/nst0 fsf
> [amanda@darcy amanda]$ dd if=/dev/nst0 bs=32k count=1
> AMANDA: FILE 20001229 elton.brandeis.edu /dev/sda1 lev 1 comp .gz program /sbin/dump
> To restore, position tape at start of file and run:
>       dd if=<tape> bs=32k skip=1 | /bin/gzip -dc | sbin/restore -f... -
> 
> 1+0 records in
> 1+0 records out
> [amanda@darcy amanda]$ mt -f /dev/nst0 fsf
> [amanda@darcy amanda]$ dd if=/dev/nst0 bs=32k count=1
> AMANDA: FILE 20001229 wickham.brandeis.edu / lev 1 comp .gz program /sbin/vdump
> To restore, position tape at start of file and run:
>       dd if=<tape> bs=32k skip=1 | /bin/gzip -dc | sbin/vrestore -f... -
> 
> 1+0 records in
> 1+0 records out
> [amanda@darcy amanda]$ mt -f /dev/nst0 fsf
> [amanda@darcy amanda]$ amrestore /dev/nst0 chipmunk hda1
> amrestore: WARNING: not at start of tape, file numbers will be offset
> amrestore:   0: skipping knightley.brandeis.edu.hda1.20001229.1
> amrestore:   1: skipping wodehouse.brandeis.edu._dev_sda4.20001229.1
> amrestore:   2: skipping churchill.brandeis.edu._dev_hda2.20001229.1
> amrestore:   3: skipping elliot.dks0d3s7.20001229.1
> amrestore:   4: skipping musgrove.brandeis.edu._dev_hda1.20001229.1
> amrestore:   5: skipping hayter.brandeis.edu._dev_hda1.20001229.1
etc.

The file header error is also reflected in /var/log/messages when the
file header cannot be read:

> Dec 30 07:01:43 darcy kernel: st0: Error with sense data: Info fld=0x20, Current 
>st09:00: sense key Medium Error 
> Dec 30 07:01:43 darcy kernel: Additional sense indicates Recorded entity not found 
> Dec 30 07:03:49 darcy kernel: st0: Error with sense data: Info fld=0x20, Current 
>st09:00: sense key Medium Error 
> Dec 30 07:03:49 darcy kernel: Additional sense indicates Recorded entity not found 

I've done test restores (and 1 real restore) off of the tapes thus
produced, and they seem ok.

The bad file header IS a missing file if I compare the log files to the
results read from tape (above, sda1 on elton was the 2nd dump written to
tape, not the first).

Analysis:

I'm pretty sure about the kernel dependency -- I spent a couple of weeks
looking for other bugs and exchanged the tape drive which had an
unrelated problem, swapped SCSI controllers, etc., but now in the last
week and a half with everything else working ok, I've had the problem 3
out of 3 times with kernel 2.4.0 and 0 out of 5 times with kernel 2.2.x

The tapes thus produced generate identical error messages when READ
using either kernel version (and when using a separate external VXA
drive), so the problem is in writing the tapes. 

I can use kernel 2.2.x on the server without too many problems, although
I wanted the newer one for other reasons (UDMA, large files, USB, etc).
But there seems to be some kind of bug here. I'll be happy to send
anyone log files or try other suggestion for debugging if someone has
any. My current limitation is that all my tape has useful information on
it, I'm waiting for a bunch of new tapes I ordered...

steven

Steven Karel, Technology Coordinator, Biology Dept, Brandeis University

Reply via email to