I've been trying to diagnose and resolve this since November, and am
still having trouble figuring out what is happening... Debian 10 doesn't
present any real easy way to decode and find details about the
hexadecimal error messages.
I know this is kinda "old-school", but I'm backing up partition images
to LTO-5 tape cartridges, and so far, the tape backup initially worked,
but recently has eventually errored on each cartridge used in the backup
attempts.
For the moment, I am willing to accept that the tar command is NOT the
culprit.
It could be the SAS Controller software, or the "mt" package which
manages the tape drive, but given that it has worked several times and
has continued to work, even as individual backups have failed, I am not
convinced that the issue is with controller or driver software.
All this leads to the hardware question, "What is failing": Tape Drive?
Cartridge(s)? Cable? SAS Controller?
Rather than just blindly substitute parts (expensive, time consuming,
frustratingly inconclusive) and try to eliminate that way, I'd really
like to have a better roadmap for locating the issue. A new SAS
Controller, or the Cable connecting the Controller to the Drive, or new
Cartridges are not so expensive as to be non-starters, but I'm retired
with limited income, and a new LTO drive would be a real stretch.
Here are three minutes of error notes from my last attempt in
kern.log/syslog:
Nov 13 08:02:29 BigMutt kernel: [34669.493781] st 0:0:0:0: device_block,
handle(0x0009)
Nov 13 08:02:29 BigMutt kernel: [34669.493879] st 0:0:0:0: [st0] Error
e0000 (driver bt 0x0, host bt 0xe).
Nov 13 08:02:31 BigMutt kernel: [34671.743620] st 0:0:0:0:
device_unblock and setting to running, handle(0x0009)
Nov 13 08:02:31 BigMutt kernel: [34671.743714] st 0:0:0:0: [st0] Error
10000 (driver bt 0x0, host bt 0x1).
Nov 13 08:02:31 BigMutt kernel: [34671.744077] st 0:0:0:0: [st0] Error
10000 (driver bt 0x0, host bt 0x1).
Nov 13 08:02:31 BigMutt kernel: [34671.745089] mpt2sas_cm0: removing
handle(0x0009), sas_addr(0x500110a001622ed0)
Nov 13 08:02:31 BigMutt kernel: [34671.745091] mpt2sas_cm0: enclosure
logical id(0x500605b00341cef0), slot(0)
Nov 13 08:02:36 BigMutt kernel: [34676.006914] scsi 0:0:1:0:
Sequential-Access HP Ultrium 5-SCSI Z6ED PQ: 0 ANSI: 6
Nov 13 08:02:36 BigMutt kernel: [34676.006922] scsi 0:0:1:0: SSP:
handle(0x0009), sas_addr(0x500110a001622ed0), phy(3),
device_name(0x500110a001622ed2)
Nov 13 08:02:36 BigMutt kernel: [34676.006924] scsi 0:0:1:0: enclosure
logical id (0x500605b00341cef0), slot(0)
Nov 13 08:02:36 BigMutt kernel: [34676.008694] scsi 0:0:1:0: TLR Enabled
Nov 13 08:02:36 BigMutt kernel: [34676.011053] st 0:0:1:0: Attached scsi
tape st0
Nov 13 08:02:36 BigMutt kernel: [34676.011056] st 0:0:1:0: st0: try
direct i/o: yes (alignment 4 B)
Nov 13 08:02:36 BigMutt kernel: [34676.011143] st 0:0:1:0: Attached scsi
generic sg2 type 1
Nov 13 08:05:24 BigMutt kernel: [34844.612941] st 0:0:1:0: [st0] Block
limits 1 - 16777215 bytes.
So, is the culprit the LTO-5 drive? Cartridge? possibly the I/O signal
cable? the SAS Controller? What do I need to do to determine the true
cause of the errors with /dev/st0?
Hardware System Configuration:
4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u2 (2019-11-11)
MB: Gigabyte 970A-D3P
CPU: AMD FX-8350 @4000.000 MHz cache: 2048 KB
RAM: 32GB (4x8GB) Unbuffered/Unregistered
LTO-5 SAS Tape on LSI SAS9211 controller
Video: GeForce 8400 GS to VIZIO E320VA