Re: Disk failure->Error message indicates bug

Neil Brown Fri, 19 May 2000 03:19:13 -0700
On  May 19, [EMAIL PROTECTED] wrote:
> Today one of my four SCSI disks in a RAID 0/1 setup broke. The kernel
> is 2.2.15 with Ingo's raid-2.2.15-A0. The error message says
> 
>      md: bug in file md.c, line 485 
> 
> Here are some details:
(deleted)


Yep.  This occurred to me only yesterday as I was mucking around in
the 2.3.99 raid code, but RAID1 over RAID0 is unsafe in both Ingo's
2.2 patches and in the current 2.3.

The scenario goes like this (for the techincally minded):
md2 is a RAID-1 array of md0 and md1.
md0 is a RAID-0 array of sda12 and sdb13
md1 is a RAID-0 array of sdc7 and sdb7

- a read request comes in for md2
- to service this we create a read request on md0 (say).
- This request gets mapped to sda12(say). Note that it is the same
   request, not a new one.  b_blocknr and b_dev are not changed, 
   they still refer to md0 (or actually md2).  But b_rsector and
   b_rdev now refer to sda12.
- We get an error on sda12 and the request gets returned with an
  error flag.
- md2 checks b_rdev to see which device was in error. It gets confused
  because sda12 is not part of md2.

The fix probably involves making sure that b_dev really does refer to
md0 (a quick look at the code suggests it actually refers to md2!)
and then using b_dev instead of b_rdev.

Basically, b_rdev and b_rsector cannot be trusted after a call to
make_request, but they are being trusted.

NeilBrown
Re: Disk failure->Error message indicates bug

Reply via email to