Are we looking at the same output? Here's the output of idacontrol show off one of my DL360 servers:
mail# idacontrol show cmd_show_all() [Compaq Integrated Array controller] Controller uptime: 301 hours 54 minutes 22 seconds Firmware Version: 1.50 (running) 1.50 (ROM) Revision - Hardware: 2 Marketing: A SCSI bus count: 2 Max drives per bus: 16 Maximum request: 65535 blocks Logical drive 0: 17359MB (35553120 sectors), blocksize=512 Status: Logical drive ok Mode: Mirroring (RAID1) Drive ID: 00000000 Drive Label: bus 1 target 0 lun 0: enclosure 0, bay 0, connector 2J <COMPAQ BB01813467 3BM0G606000071011MHF 3B07> direct-access 17361MB (35556888 512 byte sectors, 1088 reserved) Sync, Ultra2, Wide - Configured in a logical volume. bus 1 target 1 lun 0: enclosure 0, bay 1, connector 2J <COMPAQ BF01864663 3EV0J0V3000072363NRD 3B0B> direct-access 17361MB (35556888 512 byte sectors, 1088 reserved) Sync, Ultra2, Wide - Configured in a logical volume. bus 1 target 7 lun 0: enclosure 0, bay 7, connector 2J <COMPAQ PROLIANT 4L2I JB21> non-disk Async mail# There are two physical disks in the server. bus 1 target 0 and bus 1 target 1. Those ARE the physical disks. If one of them has failed instead of: Sync, Ultra2, Wide - Configured in a logical volume. you will see something like: Sync, Ultra2, Wide - Unconfigured or nothing at all. It is normal for idacontrol to generate soft write errors. The developer knows about this. There's really no easy way to make it not happen. It doesen't hurt anything, however. If the RAID card itself is flakey you can't really tell it from software. Even the Windows RAID utilities that HP/Compaq supplies won't tell you this. The "by the book" way of troubleshooting these servers is if you get a disk failure, you immediately swap the disk. Then if the failure happens again and your pretty sure it's not the disk, you down the server, and boot it into Compaq Diagnostics and let it run for a day or so. It is not uncommon to end up with several additional hard drives that you don't need in the process of identifying a bad RAID card in a server. We have all done it, it is part of the territory. If you cannot afford it, stay away from these servers. Remember these servers are designed for a medium to large corporation that has a lot of resources. To give you a typical scenario, a couple weeks ago one of our mailservers running on a Proliant 1600R started freezing up. I had the admin pull the entire disk array and put the disks into our backup server, that went online in place of the original server, and the original server was pulled and put on a test bench. About a week later the admin finally discovered the processor board had worked it's way almost out of the socket, after much hair-pulling, running of diagnostics, and so on. Ted > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] Behalf Of David Newman > Sent: Sunday, November 25, 2007 2:58 PM > To: Ted Mittelstaedt > Cc: firstname.lastname@example.org > Subject: Re: dealing with a failing drive > > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 11/24/07 12:39 PM, Ted Mittelstaedt wrote: > > The output of idacontrol show will show if one of the > > hard disks in the SmartArray has failed. Your choice with > > a hardware array is to either run it with redundancy or not. > > (ie: raid5 or mirroring or striping) You have to choose > > which is more important for you. > > > > IMHO it is very foolish to stripe an array that you have > > critical data on and assume that you can predict a failure > > of a disk using smart or other monitoring, and replace it > > in advance of a failure. If your concern is redundancy, then > > add more disks to the array and create a raid 5 or a mirror. > > Then ignore all the predictive junk and let the array card > > concern itself with detecting if a drive has failed. Run > > idacontrol periodically out of a script that checks for a > > failure of a disk and e-mails you if there is one. > > Thanks, this is good advice, but it doesn't answer the specific > questions I had: > > 1. How to diagnose the health of a *physical* disk that's part of a RAID > array (RAID1, in this case) in an old Compaq Proliant server? > > 2. Is it normal for idacontrol to generate soft write errors? > > Backstory here is that Proliant server #1 generated beaucoup hard and > soft read and write errors and eventually locked up. I thought it was > one of the disks but replacing one at a time didn't help. So I took both > disks and put them in identical Proliant server #2. Ergo, I would > conclude server #1's RAID controller flaked out. > > idacontrol is useful for telling the health of the logical disk. What it > doesn't tell me (or maybe I just don't see it) is whether the physical > disks are ok, and those "soft write errors" concern me. I had a failure > situation, and need to figure out whether just the controller was bad or > whether I need to replace at least one disk too. > > Thanks again! > > dn > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.3 (Darwin) > > iD8DBQFHSf39yPxGVjntI4IRAp1yAJ4vMV9FkeaBsHRr/Z5WpCL27wJ3tACfS+pT > 3UVlscnQUZhe8ulHksKDWsY= > =Om7/ > -----END PGP SIGNATURE----- > _______________________________________________ > email@example.com mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-questions > To unsubscribe, send any mail to > "[EMAIL PROTECTED]" > _______________________________________________ firstname.lastname@example.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"