Re: Problems with raid1 - system unusable after drive failure

Nathan Neulinger Wed, 9 Jun 1999 16:21:54 -0700
Just to make certain that I'm being clear... This was not with the root
drive. 

I have:

Primary:
        Master: hda: root
        Slave:  hdb: raid
Secondary:
        Master: hdc: cdrom
        Slave:  hdd: raid

I was not accessing the cdrom, and pulled the power on the slave drive
on the secondary channel. I was doing heavy access to md0.

Even with the lockup, I'm better off - worst case without raid is a
total data loss and server outage, worst case with raid is as much data
loss as you'd get from pushing reset and a server outage. However, it'd
be a whole lot nicer if it recovered.

-- Nathan

David Robinson wrote:
> 
> I had the same problem when I mirrored my root drive. I thought it was just a
> bug in RAIDing the root drive. All my other raid partitions went into degraded
> mode fine using the same drives.
> 
> I have now just got a cronjob that does a dd each night to the second drive so
> if the drive fails I only loose the days password changes.
> 
> As soon as I get my two new servers I will try and RAID the root drive again
> and see what happens. I personally think its a kernel disk driver bug with the
> RAID drivers.
> 
> "Neulinger, Nathan R." wrote:
> 
> > Well, I just went and set up the system with kernel 2.2.2 (without any extra
> > patches) and raidtools-0.42, and it does the exact same thing.
> >
> > It just keeps on accessing the dead drive, with increasing sector numbers in
> > the I/O error message - sort of like it is trying to continue to flush the
> > buffer cache or something.
> >
> > With the 2.2.2 clean and 0.42 it looks like the raid1 module only detects
> > the failure once, unlike with 0.90 and the current patches for 2.2.0 - with
> > that combination, it keeps on detecting the failure.
> >
> > Has _ANYONE_ had any success with this with a 2.2.x kernel?!?!?
> >
> > It looks to me like if you lose an IDE bus that a drive is hooked up to - it
> > will never recover, but if you just lose a drive, it will. This seems to be
> > a major shortcoming, but might be livable.  It looks like if I re-plug the
> > drive back in, it stays degraded, but it does recover and keep on accessing
> > the raid set.
> >
> > Is there any chance of getting this thing to recover from a bus failure?
> >
> > -- Nathan
> >
> > ------------------------------------------------------------
> > Nathan Neulinger                       EMail:  [EMAIL PROTECTED]
> > University of Missouri - Rolla         Phone: (573) 341-4841
> > Computing Services                       Fax: (573) 341-4216
> >
> > > -----Original Message-----
> > > From: Michael [mailto:[EMAIL PROTECTED]]
> > > Sent: Sunday, June 06, 1999 2:14 PM
> > > To: Nathan Neulinger
> > > Cc: [EMAIL PROTECTED]
> > > Subject: Re: Problems with raid1 - system unusable after drive failure
> > >
> > >
> > > On Sun, 6 Jun 1999, Nathan Neulinger wrote:
> > > > Are you referring just to problems with root raid, or all of the
> > > > problems in general?
> > >
> > > Problems reported to the list in general. Primarily with
> > > drive failure
> > > (usually in test) and not being able to recover) or kernel crash.
> > > >
> > > > I'm not running raid on the root drive, just on a pair of
> > > other drives
> > > > on the system.
> > > >
> > > > Is it possible to get 0.42 built with 2.2.2?
> > > >
> > > Don't know, last raid system I have is kernel 2.0.33. I
> > > haven't migrated
> > > them because changes to raid in the kernel began at 2.0.34 or 5 as I
> > > recall, and I can't afford to crash systems that work. I will
> > > wait until
> > > the raid software is a little more stable.
> > >
> > > Take a look at the diffs and give it a try.
> > >
> > > Michael
> > >

-- 


------------------------------------------------------------
Nathan Neulinger                       EMail:  [EMAIL PROTECTED]
University of Missouri - Rolla         Phone: (573) 341-4841
Computing Services                       Fax: (573) 341-4216
Re: Problems with raid1 - system unusable after drive failure

Reply via email to