Thanks for sharing that real-life case, Neil. We are historically accustomed to digital data processing inherently assuring data integrity. It was dismaying, then, to read the TSM 5.1 Technical Guide redbook and see it say things like, "New communication and SAN hardware products are more susceptible to data loss" and "...data corruption introduced either by the network or by errors within the storage environment", Data loss?? Corruption?? Indeed. The TSM developers realized this and provided CRC functionality to help validate data integrity. If you have a complex data transfer and/or storage environment, you may want to consider turning on CRC.
Richard Sims On Nov 2, 2005, at 5:30 AM, Neil Schofield wrote:
We had issues like this 18 months ago. It coincided with us adding some new HBAs to balance the load. It transpired that one of the HBAs was silently corrupting the data as it was writing it to tape. We only realised when we came to read the data back. The problem was that the same HBA was being used to write data to both the local (primary) and remote (copy) storage pools over a long-distance SAN, so we lost some data. Once we identified the HBA as the source of the problem and removed it, we then had the task of identifying every tape that had been written using it. Almost all were bad! We deleted the copy tapes and for primary tapes, we restored from copy tapes where possible. The HBA - an Emulex LP9802 - was described as having 'end-to-end parity protection' but this didn't work for us. I opened a PMR, but IBM made the observation that they are not responsible for the data after it has been passed to the HBA. So maybe one of your SCSI/FC adapters has gone bad? Regards Neil Schofield Yorkshire Water Services Ltd.
