Michael Tokarev <[EMAIL PROTECTED]> wrote:
> > mdadm -Gb internal --bitmap-chunk=1024 /dev/md4
> > mdadm /dev/md4 -r /dev/sdh1
> > mdadm /dev/md4 -f /dev/sde1 -r /dev/sde1
> > mdadm --build /dev/md5 -ayes --level=1 --raid-devices=2 /dev/sde1 missing
> > mdadm /dev/md4 --re-add /dev/md5
> > mdadm /dev/md5 -a /dev/sdh1
> >
> > ... wait a few hours for md5 resync...
>
> And here's the problem. While new disk, sdh1, are resynced from
> old, probably failing disk sde1, chances are high that there will
> be an unreadable block on sde1.
So, we need a way, to feedback the redundancy from the raid5 to the raid1.
Here is a short 5 minute brainstorm I did, to check, wether it's possible to
manage this, and I think, it is:
Requirements:
Any Raid with parity of any kind needs to provide so called "vitual
block devices", which carry the same data, as the underlaying block
devices, which the array is composed of. If the underlaying block
device can't read a block, that block will be calculated from the
other raid disks and hence is still readable using the virtual block
device.
e.g. having the disks sda1 .. sde1 in a raid5 means, the raid
provides not one new block device (e.g. /dev/md4 as in the example
above), but six (the one just mentioned and maybe we call them
/dev/vsda1 .. /dev/vsde1 or /dev/mapper/vsda1 .. /dev/mapper/vsde1
or even /dev/mapper/virtual/sda1 .. /dev/mapper/virtual/sde1). For
ease, I'll call them just vsdx1 here.
Reading any block from vsda1 will yield the same data as reading
from sda1 at any time (except the case, that reading from sda1
fails, then vsda1 will still carry that data).
Now, construct the following nested raid structure:
sda1 + vsda1 + missing = /dev/md10 RAID1 w/o super block
sdb1 + vsdb1 + missing = /dev/md11 RAID1 w/o super block
sdc1 + vsdc1 + missing = /dev/md12 RAID1 w/o super block
sdd1 + vsdd1 + missing = /dev/md13 RAID1 w/o super block
sde1 + vsde1 + missing = /dev/md14 RAID1 w/o super block
md10 + md11 + md12 + md13 + md14 = /dev/md4 RAID5 optionally with sb
Problem:
As long as md4 is not active, vsdx1 is not available. So the arrays
md1x need to be created with 1 disk out of 3. After md4 was
assembled, vsdx1 needs to be added. Now we get another problem:
There must be no sync between sdx1 and vsdx1 (they are more or less
the same device). So there should be an option to mdadm like
--assume-sync for hot-add.
What we get:
As soon as we decide to replace a disk (like sde1 as above) we just
hot-add sdh1 to the sde1-raid1 array. That array will start
resyncing. If now a block can't be read from sde1, it's just taken
from vsde1 (and there that block will be reconstructed from the
raid5).
After syncing to sdh1 was completed, sde1 may be removed from the
array.
We would loose redundancy at no time - the only lost redundancy is those of
the already failed sde1 which we can't workaround anyways (except for using
raid6 etc.).
This is only a brainstorm, and I don't know what internal effects could
cause problems, like the resyncing process of the raid1 array reading a bad
block from sde1 then triggering a reconstruction using vsde1 if in parallel
the raid5 itself detects (e.g. as cause from a user space read) sde1 to have
failed and tries to write back that block to the raid array for sde1 while
in the raid1 the same rewrite is pending already ... problems over problems,
but the evil is in detail as ever ...
Regards, Bodo
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html