In trying to make my backup script deal gracefully with backups that
overflow one tape (using the afio -H option), and repositioning to
read the last volume on the first tape, I seem to have uncovered a
serious bug, that results in kernel or memory space corruption.
The command I used (from my backup script's debug log):
afio -o -M 4m -c 800 -z -ZG 1 -T 8k -H Lbackup_change_tape \
-b 20480 /dev/nqft0 < files-to-backup
(pathnames changed to basenames for readability)
It seems that after reaching the end-of-tape (running out of tape),
if I do:
ftmt eom
ftmt status
or
ftmt bsf 2
or attempt again to write beyond the end of the tape, random processes
freeze on the machine (such as, all the shells stop responding to any
input). I have not seen any kernel panic messages in any of these
cases. Things just stop working. I have always been able to change
virtual consoles, and to look at tty12 where syslogd also sends
messages. Most of the time I am forced to press the reset button
(normal shutdown fails). At least once, when I rewound the tape
immediately after running to the end, I did not see any problem (that
I noticed). The best clue I got was from the first time it happened,
in the kernel.messages log:
Oct 30 04:22:38 rupin kernel: crc_failures : 0
Oct 30 04:22:38 rupin kernel: ecc_failures : 0
Oct 30 04:22:38 rupin kernel: sectors corrected: 0.
Oct 30 04:22:38 rupin kernel: [154] 0 ftape-ctl.c (ftape_print_history) - tape
motion statistics:
Oct 30 04:22:38 rupin kernel: repositions : 2.
Oct 30 04:52:10 rupin kernel: [155] 0 ftape-write.c (ftape_write_segment_R6ec54595)
- write error, retry 1 (2033).
Oct 30 04:52:47 rupin kernel: [156] 0 ftape-write.c (ftape_write_segment_R6ec54595)
- write error, retry 1 (2077).
Oct 30 05:05:20 rupin kernel: eth0: mismatched read page pointers 0 vs 55.
Oct 30 05:05:20 rupin kernel: eth0: mismatched read page pointers 0 vs ff.
NOTE THE
ABOVE: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Oct 30 05:05:20 rupin last message repeated 16 times
Oct 30 05:05:20 rupin kernel: eth0: unexpected TX-done interrupt, lasttx=20.
Oct 30 05:05:20 rupin kernel: eth0: mismatched read page pointers 0 vs ff.
Oct 30 05:05:20 rupin last message repeated 8 times
Oct 30 05:05:20 rupin kernel: eth0: unexpected TX-done interrupt, lasttx=20.
Oct 30 05:05:20 rupin kernel: eth0: mismatched read page pointers 0 vs ff.
Oct 30 05:05:20 rupin last message repeated 8 times
Oct 30 05:05:20 rupin kernel: eth0: unexpected TX-done interrupt, lasttx=20.
.
.
.
with lots more repetitions, and my network connection stopped working
(you knew that). No console shells froze that time.
In all the rest of the tests I ran, kernel logging stopped (no info
there, of _any_ kind -- syslogd may have frozen). This condition is
definitely repeatable: I rebooted and watched fsck, after reproducing
the above conditions, till I got tired of it....
Does any one else have this problem?
Particulars:
I have an Iomega ditto 800 ftape drive.
"afio --help" produces: Version 2.4.5 dated 28 Sep 1998
I am using ftape version 4.02
I did the testing with a small DC2120 cartridge (120 meg).
118% uname -a
Linux rupin 2.0.38 #7 Wed Sep 8 17:26:08 MDT 1999 i486 unknown
(actually an AMD 5x86)
I can provide more complete logs, more hardware info, and even a copy
of my backup script, if it will help.
It may be that a reasonable work around is to rewind the tape
immediately after reaching the end (but spacing to the end with "ftmt
eom" afterward, and trying to backspace to the last volume, to read
it, will corrupt the system every time). I don't know what will
happen when trying to read to the end of such a tape from the
beginning (when there is only one volume on it). Will this corrupt
the kernel? Are such tapes useless because of that? Can the kernel
be trusted after an end-of-tape condition? Does this happen with
other brands of tape hardware or ftape software versions? Hope you
all can provide some answers to these questions (or a bug fix would be
nice).
BTW, afio has a "-s" option that can be used to prevent from reaching
the end of the tape, but that has some bugs associated with it, too:
afio gives an error after writing the allotted quantity of data, and
apparently, no end-of-media marks get written, so that a successive
attempt to read it will try to read garbage data beyond the end of the
volume (section of the tape). Also, the afio "-H" option fails to
operate under this condition (it works when reaching the real
end-of-tape, with no "-s 118m" type of limit). The "-H" option to
afio is supposed to invoke a script that will rewind the tape, and
allow you to put in another tape, retention and possibly erase it,
etc, and then afio will continue writing it's list of files on the
next tape. I discovered the ftape bug when trying to make my backup
script use that feature automatically.
Thanks in advance for any help you can give.
LCR
--
L. C. Robinson
reply to [EMAIL PROTECTED]
People buy MicroShaft for compatibility, but get incompatibility and
instability instead. This is award winning "innovation". Find
out how MS holds your data hostage with "The *Lens*"; see
"CyberSnare" at http://www.netaction.org/msoft/cybersnare.html