Re: Problems with raid1 - system unusable after drive failure

Nathan Neulinger Sun, 6 Jun 1999 06:36:26 -0700
For me, it wasn't the root drive. It was just a couple of extra drives I
slapped in as slaves for testing.

I also had the same problem with it never rebuilding when it went down.
I believe I was able to recover by adjusting /etc/raidtab and running
mkraid with the force option. 

What I've been doing to test is just doing a big tar -cvf to the
/dev/md0 device, and then pulling the power on one of the drives. I'd
expect it to choke when I pull the power and then recover and start
writing again.

-- Nathan

David Robinson wrote:
> 
> I had the same problem.
> It was only happening on the root partition that I had mirrored. Once I
> unmirrored the root partition all the io errors stopped. When I pulled the
> power to one drive the other RAID-1 partitions wend straight into degraded mode
> without problems.
> 
> I'm not sure if this is a bug in the current version of Linux RAID but it looks
> like Linux is a little way off from being able to mirror the root device.
> 
> The RAID docs really need updating. I could not find out how to remirror a
> drive once it went into degraded mode! I only happened to find it in the end by
> playing with the programs that come with Linux RAID e.g. raidadd, which allowed
> me to remake the mirror by specifying the "bad" partition. A reboot didn't
> force a mirror rebuild.
> 
> I would suggest that everyone who sets up RAID do a test by powering down one
> of the drives.
> 
> I have currently got a cron job that does a dd on the root partition to another
> drive with exactly the same setup. This appears to work fine and I can boot off
> it. At least if the root drive fails I only loose a few password changes,etc
> from the last time the cronjob ran.
> 
> "Neulinger, Nathan R." wrote:
> 
> > We've been doing some initial testing - looking at using RAID-1 mirroring
> > with md, but have not had much luck so far.
> >
> > We've set up a raid1 device with two separate IDE drives on separate
> > controllers (on-board).
> >
> > To simulate a drive failure, we've cut the power to one drive while the raid
> > set is being used. After a long timeout, the kernel sees the failure on hdd
> > and then md says that the drive has failed and will continue in degraded
> > mode.
> >
> > The problem is - it doesn't continue, it sits there and keeps trying to
> > access that drive again and again every few seconds. The I/O operation that
> > was taking place against /dev/md0 never resumes (it stopped as soon as the
> > power was pulled on that one drive.)
> >
> > If we put the power back on to that drive, it breaks loose and starts
> > running in degraded mode.
> >
> > This is with the 990128-2.2.0 patch applied to the 2.2.2 kernel w/ fixes.
> >
> > Is this a functionality issue (i.e. does md raid1 not support continuing to
> > run after a drive failure if there are no spares?), or is something wrong.
> >
> > I can possibly upgrade to a new kernel release if absolutely necessary.
> >
> > -- Nathan
> >
> > ------------------------------------------------------------
> > Nathan Neulinger                       EMail:  [EMAIL PROTECTED]
> > University of Missouri - Rolla         Phone: (573) 341-4841
> > Computing Services                       Fax: (573) 341-4216

-- 


------------------------------------------------------------
Nathan Neulinger                       EMail:  [EMAIL PROTECTED]
University of Missouri - Rolla         Phone: (573) 341-4841
Computing Services                       Fax: (573) 341-4216
Re: Problems with raid1 - system unusable after drive failure

Reply via email to