Don't you just love being stuck in the middle of multiple supplier support teams ;)
Regards, Iain Barnetson IT Systems Administrator UKN Infrastructure Operations -----Original Message----- From: ADSM: Dist Stor Manager [mailto:[EMAIL PROTECTED] On Behalf Of Joni Moyer Sent: 16 February 2005 12:38 To: [email protected] Subject: [ADSM-L] I/O Errors and Tape Dismount issues Hi All! I had put this issue out to the group the other day, but it appears as if no one wants to take credit for these wonderful errors. (I have a lot more, but you get the idea.) These errors have been occurring for some time now and I still do not have success finding "what" is causing this issue. I have included info. from all vendors. We have a TSM AIX 5.2.2.5 server connected to an STK SL8500 library with LTO2 drives. We are using Gresham EDT 6.4.6 to control drive sharing. Any suggestions would be appreciated. I am at a loss... Thanks! Date/Time Message -------------------- ---------------------------------------------------------- 02/14/05 09:31:16 ANR8302E I/O error on drive SL8500 (/dev/rmt18) (OP=REW, Error Number=46, CC=0, KEY=02, ASC=3A, ASCQ=00, SENSE=70 .00.02.00.00.00.00.1C.00.00.00.00.3 A.00.30.00.10.13.00.0 0.00.00.20.20.20.20.20.20.20.00.00.00.00.00.13.00, Descr iption=An undetermined error has occurred). Refer to Ap pendix D in the 'Messages' manual for recommended action. (SESSION: 30844) 02/14/05 11:20:08 ANR8302E I/O error on drive SL8500 (/dev/rmt16) (OP=REW, Error Number=46, CC=0, KEY=02, ASC=3A, ASCQ=00, SENSE=70 .00.02.00.00.00.00.1C.00.00.00.00.3 A.00.30.00.10.13.00.0 0.00.00.20.20.20.20.20.20.20.00.00.00.00.00.13.00, Descr iption=An undetermined error has occurred). Refer to Ap pendix D in the 'Messages' manual for recommended action. (SESSION: 30844) 02/14/05 12:26:36 ANR8302E I/O error on drive SL8500 (/dev/rmt12) (OP=WRITE, Error Number=46, CC=0, KEY=02, ASC=04, ASCQ=02, SENSE=70 .00.02.00.00.00.00.1 C.00.00.00.00.04.02.30.00.10.12.00.0 0.00.00.20.20.20.20.20.20.20.00.00.00.00.00.13.00, Descr iption=An undetermined error has occurred). Refer to Ap pendix D in the 'Messages' manual for recommended action. (SESSION: 31877) Ideas from IBM Support This is extremely odd. As a matter of fact, this probably flat out shouldn't happen. - Normally, I would think this is the customer having some sort of pathing problem or libarary sharing protocol problem. But in this case, TSM is more at the mercy of your library manager since we don't know about drives and paths when using Gresham. - Here's why it shouldn't happen. - The OP code is LOCATE. Locate is not the first thing we do with a drive. We have to open it, read the label, then maybe read some more data, then issue locate. Meaning, we have had this drive open, confirmed that it has the right tape in it, and then at some point we do a locate to a block somewhere out in the middle of the tape. - However, this error imples that there is no tape in the drive at the time we issued the locate request. This can be about 2 things: 1. The drive had some problem and responded very much in error to the situation. 2. Somewhere along the chain, some device sent a scsi command to the wrong drive. This is probably much more likely. - I may be able to shed more light on this if you can send me the TSM activity log from the time that process started, but I also may not. If you suspect that you are having problems with some fancy device you have between your TSM server and your tape drives, this is a very likely explanation for the problem. Ideas from STK support TSM activity logs reveal that backups are succeeding although I do see some write errors. However, there are a large number of LOCATE errors during reclamation. This may indicate a problem in: 1. TSM configuration 2. Gresham component 3. Media 4. Drives (since the backups are successful, it is unlikely the problem is on the drive side) Ideas from Gresham Support searched through the TSM log for I/O errors, and then cross-checked those against the EDT log. I have attached a file which merges the I/O errors from the TSM log and the EDT diagnostics, so that we can see the order of the messages. It looks to me like the problem starts with errors communicating with the drive - through the data stream - with TSM. Then, apparently TSM tries to dismount the drive through EDT. When EDT tries to comply, it gets a DISMOUNT FAILURE and a LIBRARY ERROR. So it looks like both the control path (through EDT) and the data path (through the device driver, directly to the drive) fail at the same time. The ACSLS server may be the problem, but I suggest that there may be a single point of failure in the communications path which affects both the control path and the data path. Perhaps you have a network component which is intermittently failing? I also suggest you look at the ACSLS logs at the same time for clues. ******************************** Joni Moyer Highmark Storage Systems Work:(717)302-6603 Fax:(717)302-5974 [EMAIL PROTECTED] ********************************
