One of my TSM servers has drives continously going offline over the past few days. I have 2 servers attached to the same library, server 1 is fine, server 2 keeps getting drive failures. On Sunday, all 4 drives went down within hours of each other! This strikes me as suspecious, I see this sort of message in the system logs:
Feb 10 23:55:39 tsm2 lmcpd[1213]: [ID 470916 daemon.error] Received message 52,lLibrary ids02atl1 is going offline Feb 10 23:55:40 tsm2 last message repeated 1 time Feb 11 00:17:42 tsm2 lmcpd[1213]: [ID 257369 daemon.error] Library ids02atl1 is online to host Feb 11 00:38:31 tsm2 lmcpd[1213]: [ID 470916 daemon.error] Received message 52,lLibrary ids02atl1 is going offline Feb 11 00:40:21 tsm2 last message repeated 1 time Feb 11 00:55:33 tsm2 lmcpd[1213]: [ID 257369 daemon.error] Library ids02atl1 is online to host Feb 11 01:00:43 tsm2 IBMtape: [ID 243001 kern.info] IBMtape(130) 03590E1A S/N 0000000E6955 SENSE DATA: Feb 11 01:00:43 tsm2 IBMtape: [ID 243001 kern.info] IBMtape(130) 71 0 6 0 0 0 0 58 0 0 0 0 29 0 FF 2 Feb 11 01:00:43 tsm2 IBMtape: [ID 243001 kern.info] IBMtape(130) C4 42 0 15 0 0 0 0 0 0 0 0 0 0 0 0 Feb 11 01:00:43 tsm2 IBMtape: [ID 243001 kern.info] IBMtape(130) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Feb 11 01:00:43 tsm2 last message repeated 3 times Feb 11 01:00:44 tsm2 IBMtape: [ID 243001 kern.info] IBMtape(262) 03590E1A S/N 0000000E6952 SENSE DATA: Feb 11 01:00:44 tsm2 IBMtape: [ID 243001 kern.info] IBMtape(262) 71 0 6 0 0 0 0 58 0 0 0 0 29 0 FF 2 Feb 11 01:00:44 tsm2 IBMtape: [ID 243001 kern.info] IBMtape(262) C4 42 0 33 0 0 0 0 0 0 0 0 0 0 0 0 Feb 11 01:00:44 tsm2 IBMtape: [ID 243001 kern.info] IBMtape(262) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Feb 11 01:00:44 tsm2 last message repeated 3 times Feb 11 01:19:17 tsm2 IBMtape: [ID 243001 kern.info] IBMtape(292) 03590E1A S/N 0000000E7068 SENSE DATA: Feb 11 01:19:17 tsm2 IBMtape: [ID 243001 kern.info] IBMtape(292) 71 0 6 0 0 0 0 58 0 0 0 0 29 0 FF 2 Feb 11 01:19:17 tsm2 IBMtape: [ID 243001 kern.info] IBMtape(292) C4 42 0 24 0 0 0 0 0 0 0 0 0 0 0 0 Feb 11 01:19:17 tsm2 IBMtape: [ID 243001 kern.info] IBMtape(292) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Feb 11 01:19:17 tsm2 last message repeated 3 times Feb 11 01:20:17 tsm2 IBMtape: [ID 243001 kern.info] NOTICE: IBMtape(262) _write: ec82 < Logical EOT notification, rc 0 Feb 11 01:29:29 tsm2 IBMtape: [ID 243001 kern.info] NOTICE: IBMtape(292) _write: 2a091 < Logical EOT notification, rc 0 Feb 11 04:29:32 tsm2 lmcpd[1213]: [ID 410567 daemon.error] ERROR on ids02atl1, volume 2C0389, ERA 83 Library Drive Exception Feb 11 04:36:17 tsm2 IBMtape: [ID 243001 kern.info] NOTICE: IBMtape(292) _write: dfddd < Logical EOT notification, rc 0 This is happening with multiple tapes, not the same few. Does this sound like a hardware problem or a software/driver issue? I can't find anything googling around for the errors. The error on the library is that a drive failed with an unload error, the tape is stuck down in the drive. TSM 5.1.8.1 Solaris 8 IBMtape driver 4.0.8.0 (latest I am pretty sure) lmcpd 5.3.9.0 (latest) Drives are SCSI attached to the server Michael French