Re: [PATCH] show an error instead of segfaulting on grub-probe -t partmap on a unsynced raid

Felix Zielcke Sat, 02 Aug 2008 12:01:26 -0700

Now with trying if raid5 makes problem I looked further into this whole
raid stuff in grub.


Am Freitag, den 01.08.2008, 17:22 +0200 schrieb Felix Zielcke:
> Am Freitag, den 01.08.2008, 15:36 +0200 schrieb Robert Millan:
> 
> > The existing code is confusing.  What is total_devs for?  If we need to
> > iterate up to 32, sounds like this variable is pointless?
> > 
> > If it's useless, it should be removed (but maybe it isn't!).
> > 
> 
> total_devs is a fast check if the array is readable.

total_devs is the number of disks according to the superblock
nr_devs is the number grub found.

The value from the superblock is needed.
nr_devs could be removed but as I already said above it's faster with
it.
If it doestn't exist you have to iterate over all 32 devices and then
check the found number of disks with total_devs to find out if the array
is readable
(grub_is_array_readable in disk/raid.c)

And the raid5 has a check, if one disk in the array is missing and then
an error occurs while reading from a different one then stops reading
further from the whole array.

So I would suggest to just keep both, as long as you don't need the
sizeof(unsigned int) this would save.
But with the super 0.90 format both could be just an unsigned char, it
should never happen that we have more then 28 devices which 0.90
supports and the current code just assumes 32 so there'll be problems
anyway.

> ...

To make that a bit more clear:
mdadm allocates the disk number stored in superblock begining with 0
If you remove the disk with number 0 then the others still keep there
number.
If you add it back, then it gets while it's syncing the first highest
disk number which is free.
So if you make a raid1 with 2 disks then the numbers are 0 and 1
if you remove the one with has 0 and then add it back then it becomes 2 
If it's fully synced then it get's changed back to 0

And the problem in grub is that the disk number in superblock is used
for the array->device[] so there can be holes with missing disks.



I thought about changing grub_raid_scan_device so the disks get added to
a temporary array with the superblock number just as it's currently done
and then copy from that one to the used array->devices[] skipping all
invalid entrys.

This would be fine for Raid 0 and 1
But there's Raid 5 :(

On Raid 5 one disk can be missing and everything can be still read it's
just slower.
The problem is we need every disk which is in the array and not just the
working ones to calculate from which disk shall be read.

So the Raid 1 case can be fixed fine with my previous patch which
shouldn't slow down that much
All 32 devices are just checked on grub-probe
real grub stops when it found a working one which should be for the most
users always a very low number

Maybe I can find a solution for this Raid 5 problem, but I doubt.
So we should probable warn users if their /boot is on a raid 5

Raid 5 worked for me fine with qemu just using 2 of the 3 disk images,
when it was fully synced, the order of them didn't even matter.
But if I removed a disk with mdadm then grub failed with the other 2
used.



_______________________________________________
Grub-devel mailing list
Grub-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/grub-devel

Re: [PATCH] show an error instead of segfaulting on grub-probe -t partmap on a unsynced raid

Reply via email to