One of my TSM servers has drives continously going offline over the past few days.  I 
have 2 servers attached to the same library, server 1 is fine, server 2 keeps getting 
drive failures.  On Sunday, all 4 drives went down within hours of each other!  This 
strikes me as suspecious, I see this sort of message in the system logs:

Feb 10 23:55:39 tsm2 lmcpd[1213]: [ID 470916 daemon.error] Received message 
52,lLibrary ids02atl1 is going offline
Feb 10 23:55:40 tsm2 last message repeated 1 time
Feb 11 00:17:42 tsm2 lmcpd[1213]: [ID 257369 daemon.error] Library ids02atl1 is online 
to host
Feb 11 00:38:31 tsm2 lmcpd[1213]: [ID 470916 daemon.error] Received message 
52,lLibrary ids02atl1 is going offline
Feb 11 00:40:21 tsm2 last message repeated 1 time
Feb 11 00:55:33 tsm2 lmcpd[1213]: [ID 257369 daemon.error] Library ids02atl1 is online 
to host
Feb 11 01:00:43 tsm2 IBMtape: [ID 243001 kern.info]          IBMtape(130) 03590E1A     
   S/N 0000000E6955 SENSE DATA:
Feb 11 01:00:43 tsm2 IBMtape: [ID 243001 kern.info]          IBMtape(130)  71  0  6  0 
 0  0  0 58  0  0  0  0 29  0 FF  2
Feb 11 01:00:43 tsm2 IBMtape: [ID 243001 kern.info]          IBMtape(130)  C4 42  0 15 
 0  0  0  0  0  0  0  0  0  0  0  0
Feb 11 01:00:43 tsm2 IBMtape: [ID 243001 kern.info]          IBMtape(130)   0  0  0  0 
 0  0  0  0  0  0  0  0  0  0  0  0
Feb 11 01:00:43 tsm2 last message repeated 3 times
Feb 11 01:00:44 tsm2 IBMtape: [ID 243001 kern.info]          IBMtape(262) 03590E1A     
   S/N 0000000E6952 SENSE DATA:
Feb 11 01:00:44 tsm2 IBMtape: [ID 243001 kern.info]          IBMtape(262)  71  0  6  0 
 0  0  0 58  0  0  0  0 29  0 FF  2
Feb 11 01:00:44 tsm2 IBMtape: [ID 243001 kern.info]          IBMtape(262)  C4 42  0 33 
 0  0  0  0  0  0  0  0  0  0  0  0
Feb 11 01:00:44 tsm2 IBMtape: [ID 243001 kern.info]          IBMtape(262)   0  0  0  0 
 0  0  0  0  0  0  0  0  0  0  0  0
Feb 11 01:00:44 tsm2 last message repeated 3 times
Feb 11 01:19:17 tsm2 IBMtape: [ID 243001 kern.info]          IBMtape(292) 03590E1A     
   S/N 0000000E7068 SENSE DATA:
Feb 11 01:19:17 tsm2 IBMtape: [ID 243001 kern.info]          IBMtape(292)  71  0  6  0 
 0  0  0 58  0  0  0  0 29  0 FF  2
Feb 11 01:19:17 tsm2 IBMtape: [ID 243001 kern.info]          IBMtape(292)  C4 42  0 24 
 0  0  0  0  0  0  0  0  0  0  0  0
Feb 11 01:19:17 tsm2 IBMtape: [ID 243001 kern.info]          IBMtape(292)   0  0  0  0 
 0  0  0  0  0  0  0  0  0  0  0  0
Feb 11 01:19:17 tsm2 last message repeated 3 times
Feb 11 01:20:17 tsm2 IBMtape: [ID 243001 kern.info] NOTICE:  IBMtape(262) _write:    
ec82 < Logical EOT notification, rc 0
Feb 11 01:29:29 tsm2 IBMtape: [ID 243001 kern.info] NOTICE:  IBMtape(292) _write:   
2a091 < Logical EOT notification, rc 0
Feb 11 04:29:32 tsm2 lmcpd[1213]: [ID 410567 daemon.error] ERROR on ids02atl1, volume 
2C0389, ERA 83 Library Drive Exception
Feb 11 04:36:17 tsm2 IBMtape: [ID 243001 kern.info] NOTICE:  IBMtape(292) _write:   
dfddd < Logical EOT notification, rc 0


This is happening with multiple tapes, not the same few.  Does this sound like a 
hardware problem or a software/driver issue?  I can't find anything googling around 
for the errors.  The error on the library is that a drive failed with an unload error, 
the tape is stuck down in the drive.

TSM 5.1.8.1
Solaris 8
IBMtape driver 4.0.8.0 (latest I am pretty sure)
lmcpd 5.3.9.0 (latest)
Drives are SCSI attached to the server

Michael French

Reply via email to