On Wednesday 29 October 2008, Hendrik Boom wrote: > On Wed, 29 Oct 2008 13:00:25 -0400, Hal Vaughan wrote: > > On Wednesday 29 October 2008, Hendrik Boom wrote: > >> I got the message (via email) > >> > >> This is an automatically generated mail message from mdadm running > >> on april > >> > >> A DegradedArray event had been detected on md device /dev/md0. > >> > >> Faithfully yours, etc. > >> > >> P.S. The /proc/mdstat file currently contains the following: > >> > >> Personalities : [raid1] > >> md0 : active raid1 hda3[0] > >> 242219968 blocks [2/1] [U_] > >> > >> unused devices: <none> > > > > You don't mention that you've checked the array with mdadm --detail > > /dev/md0. Try that and it will give you some good information. > > april:/farhome/hendrik# mdadm --detail /dev/md0 > /dev/md0: > Version : 00.90.03 > Creation Time : Sun Feb 19 10:53:01 2006 > Raid Level : raid1 > Array Size : 242219968 (231.00 GiB 248.03 GB) > Device Size : 242219968 (231.00 GiB 248.03 GB) > Raid Devices : 2 > Total Devices : 1 > Preferred Minor : 0 > Persistence : Superblock is persistent > > Update Time : Wed Oct 29 13:23:15 2008 > State : clean, degraded > Active Devices : 1 > Working Devices : 1 > Failed Devices : 0 > Spare Devices : 0 > > UUID : 4dc189ba:e7a12d38:e6262cdf:db1beda2 > Events : 0.5130704 > > Number Major Minor RaidDevice State > 0 3 3 0 active sync /dev/hda3 > 1 0 0 1 removed > april:/farhome/hendrik# > > > > So from this do I conclude that /dev/hda3 is still working, but that > it's the other drive (which isn't identified) that has trouble? > > I'm a bit surprised that none of the messages identifies the other > drive, /dev/hdc3. Is this normal? Is that information available > somewhere besides the sysadmin's memory?
Luckily it's been at least a couple months since I worked with a degraded array, but I *thought* it listed the failed devices as well. It looks like the device has not only failed but been removed -- is there a chance you removed it after the failure, before running this command? > > I've never used /proc/mdstat because the --detail option gives me > > more data in one shot. From what I remember, this is a raid1, > > right? It looks like it has 2 devices and one is still working, > > but I might be wrong. Again --detail will spell out a lot of this > > explicitly. > > > >> Now I gather from what I've googled that somehow I've got to get > >> the RAID to reestablish the failed drive by copying from the > >> nonfailed drive. I do believe the hardware is basically OK, and > >> that what I've got is probably a problem due to a power failure > >> (We've had a lot of these recently) or something transient. > >> > >> (a) How do I do this? > > > > If a drive has actually failed, then mdadm --remove /dev/md0 > > /dev/hdxx. If the drive has not failed, then you need to fail it > > first with --fail as an option/switch for mdadm. > > So presumably the thing to do is > mdadm --fail /dev/md0 /dev/hdc3 > mdadm --remove /dev/md0 /dev/hdc3 > and then > mdadm --add/dev/md0 /dev/hdc3 I think there's a "--readd" that you have to use or something like that, but I'd try --add first and see if that works. You might find that hdc3 has already failed and, form the output above, looks like it's already been removed. > Is the --fail really needed in my case? the --detail option seems to > have given /dev/hdc3 the status of "removed" (although it failed to > mention is was /dev/hdc3). I've had trouble with removing drives if I didn't manually fail them. Someone who knows the inner workings of mdadm might be able to provide more information on that. > >> (b) is hda3 the failed drive, or is it the one that's still > >> working? > > > > That's one of the things mdadm --detail /dev/md0 will tell you. It > > will list the active drives and the failed drives. > > Well. I'm glad I was paranoid enough to ask. It seems to be the > drive that's working. Glas I didn't try to remove and add in *that* > one. Yes, paranoia is a good thing in system administration. It's kept me from severe problems previously! Hal -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

