I think this may be what is causing my infinite running amrecover problem. It seems the drive isn't properly sensing EOF and/or EOT correctly. Or something along those lines since the tape stops at EOT, but dd keeps right on going.
I really hope someone can set me straight on this. Here's the pertinent information, I'm running linux kernel 2.4.19, mt-st version 0.7. I have a 100GB SDLT drive. I've had this problem on kernel versions back to 2.4.16 I believe, although this is the first actual controlled experiment I've run. I recently completely reinstalled the server since we needed to upgrade software, and I've been getting really flaky amrecover problems, so I just put the newest kernel available on it. I was hoping the new kernel would fix my amrecover problems, but they persist. I decided to run the following experiment to determine if the problem is amanda, the tape drive, or some other odd software problem. Dear God, somebody help me. Here's my test file: 7176 Dec 21 04:12 sendsize.20021221040023.debug First, let's see what the drive says: $mt -f /dev/sdlt-norw compression 0 $mt -f /dev/sdlt-norw status SCSI 2 tape drive: File number=0, block number=0, partition=0. Tape block size 0 bytes. Density code 0x48 (no translation). Soft error count since last status=0 General status bits on (41010000): BOT ONLINE IM_REP_EN I execute the following command: $dd if=sendsize.20021221040023.debug of=/dev/sdlt-norw 14+1 records in 14+1 records out This will not rewind the tape when it's done. Let's see where the tape stopped: $mt -f /dev/sdlt-norw tell At block 16. $mt -f /dev/sdlt-norw status SCSI 2 tape drive: File number=1, block number=0, partition=0. Tape block size 0 bytes. Density code 0x48 (no translation). Soft error count since last status=0 General status bits on (81010000): EOF ONLINE IM_REP_EN So far so good. I rewind the tape: $mt -f /dev/sdlt-norw rewind I check the block count, just to be sure: $mt -f /dev/sdlt-norw tell At block 0. I then execute this command: $dd if=/dev/sdlt-norw of=blah This will not rewind the tape when it's done. After a while, I CTRL-C this command, and dd states the following: 61+0 records in 61+0 records out Looking at the file 'blah', I see: $ls -l blah 31232 Dec 26 06:42 blah As it should be after reading 61 records. However, the tape is at: $mt -f /dev/sdlt-norw tell At block 16. This is odd, it read a lot more than 16 blocks from the tape, or so dd would lead me to believe. After all, what is in that 31232 byte file 'blah'? Looking at the file, we see that when it gets to the end of the file on tape, the tape drive stops, but dd keeps writing the last block that it read over and over and over to 'blah': Total bytes written: 52354334720 (49GB, 100MB/s) ..... sendsize: getting size via gnutar for /export/ext_raid level 1 sendsize: spawning /usr/libexec/runtar in pipeline sendsize: argument list: /bin/tar --create --file /dev/null --directory /export/ext_raid --one-file-system --listed-incremental /var/lib/amanda/gnutar-lists/snoopy.cs.drexel.edu_export_ext_raid_1.new --sparse --ignore-failed-read --totals . Total bytes written: 4605429760 (4.3GB, 41MB/s) ..... sendsize: pid 16484 finish time Sat Dec 21 04:12:08 2002 es written: 52354334720 (49GB, 100MB/s) ..... sendsize: getting size via gnutar for /export/ext_raid level 1 sendsize: spawning /usr/libexec/runtar in pipeline sendsize: argument list: /bin/tar --create --file /dev/null --directory /export/ext_raid --one-file-system --listed-incremental /var/lib/amanda/gnutar-lists/snoopy.cs.drexel.edu_export_ext_raid_1.new --sparse --ignore-failed-read --totals . Total bytes written: 4605429760 (4.3GB, 41MB/s) ..... sendsize: pid 16484 finish time Sat Dec 21 04:12:08 2002 es written: 52354334720 (49GB, 100MB/s) ..... sendsize: getting size via gnutar for /export/ext_raid level 1 sendsize: spawning /usr/libexec/runtar in pipeline sendsize: argument list: /bin/tar --create --file /dev/null --directory /export/ext_raid --one-file-system --listed-incremental /var/lib/amanda/gnutar-lists/snoopy.cs.drexel.edu_export_ext_raid_1.new --sparse --ignore-failed-read --totals . Total bytes written: 4605429760 (4.3GB, 41MB/s) ..... sendsize: pid 16484 finish time Sat Dec 21 04:12:08 2002 It will continue to do this FOREVER. This is okay for tar files, since tar understands where to stop, and does so. This also means that I can tar/untar to and from a tape without a problem. I imagine that tar knows when to stop, and so it closes the connection to the device, which is why I get my files from amrecover properly. So, now I wanted to make sure all of my end markers are being written, so I executed the following: $mt -f /dev/sdlt-norw rewind $dd if=/dev/sdlt-norw of=blah2 count=30 30+0 records in 30+0 records out $mt -f /dev/sdlt-norw tell At block 16. Looking at the file, I notice that it did exactly the same thing that the previous dd did. Does this imply that the EOT marker is being properly written, and so the tape drive is stopping at the right place, but something isn't being told to dd so it knows to stop? Am I just doing something completely wrong? For brevity's sake, I tried the same test, but writing 2 files to the tape. There is a single NULL character that is written between the files, which should properly signify EOF, but dd will keep right on reading beyond that to EOT, except that it keeps repeating the last block over and over again once it hits EOT. Dear God, somebody help me. -Chris
