1 drive failure per month, out of >34, may or may not be that far from "normal", depending on your environment & workload.
In the first place, no one, including IBM, has ever said that LTO drives will take the kind of heavy-duty pounding that 359X drives will. That is the difference between them, and why IBM sells both types of drives: LTO drives are designed to be inexpensive; 359X drives are designed to be the best quality drives you can buy. There is almost an ORDER OF MAGNITUDE difference in the cost of an LTO1 drive and a 3592, and there's a reason for it! There is also a difference in what tends to cause drives to fail. Lots of mounts/dismounts to read/write small files (which causes a lot of start/stop activity moving the media) is a lot tougher on the drive than if you just mount the tape and write 200GB of data from beginning to end. None of my sites are having any persistent problems with LTO drives (my only LTO experience is with IBM drives), but they are also not high-stress environments. Now you say you have >34 LTO drives; that's a LOT in one site. So I assume you must have a LOT of data & a LOT of activity. I agree with everything the other posters have said: If your drives are failing shortly after being replaced, you may have a manufacturing problem or an installation problem. If those drives are busy only a few hours each night dumping big data bases, they shouldn't be failing often. If they are failing randomly with mechanical problems, look for environmental problems (dust, heat, power). If they are failing randomly with I/O errors, unreadable/unwriteable data, causing data integrity problems, NOTHING should cause that. Sit on your vendor, keep them in the site CONSTANTLY until they have an explanation. Be sure you call the Field Engineering manager and stay on his/her case AND keep the sales/marketing rep involved. Sometimes the field engineers reach the point they don't know what else to do. But, I belive all major vendors have second-level regional experts, and a third-level support team that can do a post-mortem on drives and figure out what is causing the failure. But you have to be a squeaky wheel to get to that level, and you have to be persistent (thus the reason you have to get the Field Engineering manager is involved). You have a LOT of hardware on the floor; yell until you get attention. On the other hand, if you drives are taking a real pounding, busy MANY hours a day with reclaims, migration, other TSM activity, but giving out quietly without causing data integrity problems, you might just be getting good value for the money! Wanda Prather "I/O, I/O, It's all about I/O" -(me) -----Original Message----- From: ADSM: Dist Stor Manager [mailto:[EMAIL PROTECTED] On Behalf Of Dennis Melburn W IT743 Sent: Tuesday, December 13, 2005 2:30 PM To: [email protected] Subject: Re: Normal # of failures on tape libraries Ahh, so it's the fact that they are LTO drives. So as far as LTO drives go then, what I am experiencing is "normal"? Mel Dennis -----Original Message----- From: ADSM: Dist Stor Manager [mailto:[EMAIL PROTECTED] On Behalf Of Zoltan Forray/AC/VCU Sent: Tuesday, December 13, 2005 2:26 PM To: [email protected] Subject: Re: [ADSM-L] Normal # of failures on tape libraries I agree. My 3590's (both B and E1A models) have been through major pounding, for many, many years, and like the Energizer Bunny, keep going and going. Yes, they do need some repairs/maintenance, but considering the amount of data/mounts/tapes they go through on a daily basis, they are like tanks. Never had a whole drive, replaced. Usually things like cleaning brushes, sometimes R/W heads, 2-3 card-packs, stuff like that. This in contrast to my [EMAIL PROTECTED]&* IBM 3583/3580 LTO2 drives, which over the 1.5-years I have been using them, all 8-drives have been replaced, at least once, some more. I haven't kept strict tabs on them, but considering I just had 3-replaced over the past 2-weeks, from my experience, LTO2 drives are garbage. They require weekly, if not daily, attention. The 2-LTO libraries have 300-tapes between them, the 3494 library with the 3590 drives has over 3700, with 400+ mounts a day ! FWIW, when I went to a "storage" show-and-tell-and-try-to-sell, the ADIC folks told me they OEM their drives from IBM ! Richard Sims <[EMAIL PROTECTED]> Sent by: "ADSM: Dist Stor Manager" <[email protected]> 12/13/2005 02:01 PM Please respond to "ADSM: Dist Stor Manager" <[email protected]> To [email protected] cc Subject Re: [ADSM-L] Normal # of failures on tape libraries On Dec 13, 2005, at 11:31 AM, Dennis Melburn W IT743 wrote: > Our sites use ADIC Scalar 1Ks as well as one ADIC 10K. The Scalar 1Ks > have 4 LTO1 drives in each and the 10K has 34 LTO2 drives. We > experience occasional failures on these drives and have to replace > them. > My question is, is it normal for a site that has alot of drives to > experience drive failures about every 1-1.5 months? My manager is > rather annoyed at the fact that it seems that we are constantly > replacing drives even though it doesn't cause any downtime for our TSM > servers while they are being replaced. If this is a normal part of > having tape libraries then that is fine, but I don't have enough > experience in this field to say either way, so that is why I am asking > all of you. Customers with 359x drives (which are never replaced) would certainly find that replacement frequency alarming; and from any perspective, that's rather extreme. Your site may have periodic management-level review meetings with the vendor, where a good explanation should be required of the vendor. Your management might then specify that if a resolution to the problem is not forthcoming, then they might abandon that vendor for another. (A complication there is that ADIC has been the OEM for some name-brand drive resellers.) Make sure they review external factors for cause, such as bad power feeding the drives, excessive contaminants in the local atmosphere, tapes coming back from offsite after rough handling, etc. In any site where drive replacement occurs with any frequency, I would advise chronicling the serial numbers of all such drives. You would like to believe that you are getting new drives as replacements, where the serial number should be nearby or higher than that being replaced - and that you don't find the same drive coming back sometime later. Richard Sims
