Mark (and anyone else monitoring), Update on our situation, which I think was similar to yours...
We've been having erratic TSM server hangs, but only the TSM server, the OS and other minor apps on the system are OK. There were no evident bottlenecks or problems. Yesterday, I reviewed the actlog for the previous 4 days and saw many I/O errors like this: 02/18/02 12:04:05 ANR8302E I/O error on drive DRIVE4 (/dev/rmt/11m) (OP=FSR, CC=-1, KEY=FF, ASC=FF, ASCQ=FF, SENSE=**NONE**, Description=An undetermined error has occurred). Refer to Appendix D in the 'Messages' manual for recommended action. I searched the actlog for mount msgs, ANR8302E msgs, and dismount msgs. Whenever tape volume 000432 was mounted, it was followed by many 8302 msgs, and never saw a dismount for that tape. Conclusion: there must be something wrong with that tape! I marked it unavailable and have not seen the hang in the last 24 hours. We know that tape was used a few days prior to the hangs for backup of an NT server that had known disk (FAT) corruption, but we thought it was in the recycle bin which is excluded. We were then constantly doing storage pool backups which we think mounted that tape and caused the hang when it reached the corruption on the tape. I am now concerned that an apparent media fault can cause TSM to hang. I have an open PMR with Tivoli. I guess the moral of the story is -- investigate I/O errors thoroughly! Robin Sharpe Berlex Labs
