> Basically I've got a ASUS A7V133 RAID m/b with 1G ram, althon 1500XP and
 > two 40G
 > 7200rpm disks. One disk on the main  controller, the other on the
 > onboard promise controller.
 >
 > Booted mandrake 9.0 disk #1, partitioned as follows:
 >
 > /dev/hda1    256M    -    /boot [ext2]
 > /dev/hda2    24G    -    added to md0
 > /dev/hda3    1G        swap
 > /dev/hds4    reat    -    addes to md1
 >
 > /dev/hde1    256M    -    /tmp [reiserfs]
 > /dev/hde2    24G    -    added to md0
 > /dev/hde3    1G        swap
 > /dev/hde4    reat    -    addes to md1
 >
 > md0 is mode 0    /
 > md1 is mode 1    /home
 >
 > Installed mandrake, went fine, rebooted, failed !

I've been using Mandrake for a while (since the 7.x versions), and I am 
a huge admirer. However, Mandrake's implementation of software-RAID is, 
and has been as long as I have used it, complete pants! It is, by a long 
chalk, the worst thing about the distro.

Software-RAID has been fantastically simple since Ingo Molnar wrote the 
new raidtools several years ago. Auto-recognition makes everything very 
easy. But as far as I can tell, Mandrake have yet to get to grips with 
auto-recognition. It seems that they are still trying to fire up the 
arrays from rc.sysinit. This is presumably because they have always left 
RAID support as modules, rather than building it into the default 
kernels. This is OK, so long as none of the files involved in booting 
are on any of your arrays. As soon as you try something like root-RAID, 
it all falls down. Or at least, this is my interpretation of why every 
time I upgrade my root-RAID Mandrake-based server, I have to remember to 
leave behind a purpose-built kernel and add an extra entry to lilo.conf 
to point to it. If you forget to do this, you are stuffed. Every version 
of Mandrake I have used has kernel-paniced on bootup when using the 
default kernels and initrds on my root-RAID system. As soon as you use a 
kernel with RAID-support built in, it boots fine. This says to me that 
Mandrake does not support root-RAID.

If I were setting up a root-RAID system based on Mandrake, I would do 
the following (I am assuming that this is on RAID-1 or -5, it's not a 
good plan to put / on RAID-0 anyway):

Install / originally to one of the partitions that will be in the 
root-RAID. Leave the other partitions in the array unassigned for now.

Having rebooted after installation, rebuild the kernel with whichever 
RAID levels you need built in (and anything else that would otherwise 
have to be in the initrd). Do the usual make menuconfig; make dep; make 
clean; make bzImage; make modules; make modules_install; cp 
boot/arch/i386/bzImage /boot etc.

Edit lilo.conf to add an extra option to boot from this kernel. Lose the 
initrd line for this option, as you shouldn't need it. Run lilo and then 
reboot, selecting this option, to check that the new kernel boots OK.

Setup the array that will hold /, with the partition that is currently / 
set as "failed-disk" in your raidtab. This should setup the array in 
degraded mode. You will want to make sure the partitions in the array 
other than the "failed-disk" partition are set to type "fd" for RAID 
auto-recognition before building the array.

Having built the array, check /proc/mdstat to see that everything worked 
out. If so, reboot again, to test whether the new kernel will 
automatically recognise and start the array.

Build the filesystem on the array, mount the array at a temporary 
mount-point (e.g. /mnt/disk), and copy all the files on / to the array, 
making sure not to include files from other partitions (tar is good for 
the job).

Edit the copy of lilo.conf on the array to add an extra option that has 
root on /dev/mdX (where X is the number of this array), rather than on 
/dev/hd??. Run e.g. "lilo -r /mnt/disk" to install the new boot option.

Edit /mnt/disk/etc/fstab to change the root device from /dev/hd?? to 
/dev/mdX.

Reboot and select the option to boot with the RAID as root from the lilo 
prompt.

If this boots successfully, you have a working root-RAID system, and 
don't need the partition that you originally installed to (the 
failed-disk). Make sure this partition is not mounted anywhere. Change 
its partition type to "fd". Edit /etc/raidtab so that this partition is 
now a normal "raid-disk", rather than a "failed-disk". raidhotadd the 
partition to the array.

Watch the resync progress in /proc/mdstat. When it has completed, reboot 
again. You probably want to make the root-RAID option in lilo the 
default option, so you don't have to remember to select it when you 
boot. You will also want to get rid of the lilo options that point to 
the original partition for /, as the data on this partition has been 
destroyed. If you are using RAID-1 (which is much the better option for 
root-RAID), you can actually leave these options, as the individual 
mirrors in a RAID-1 can be mounted in their own right, so these boot 
options should still work, and could prove useful for recovering if 
something goes wrong when rebooting now. With RAID-5, you can no longer 
boot from the original partition.

Although the arrays should be started before rc.sysinit even kicks in, 
so the RAID part of that script should have no impact, I strongly 
suggest commenting out the RAID section, as it can still have an effect 
if you are, say, testing an array. You might want to build an array (in 
which case you will have described it in /etc/raidtab), but not want to 
start it automatically at boot. With auto-recognition, all you do is 
leave the partition types set to "83" or whatever is appropriate. But 
the Mandrake rc.sysinit will try to start any array in /etc/raidtab, 
whether you want it or not, so get rid of it.

All of which is a very tedious process, which could be avoided if 
Mandrake had half a clue about software-RAID.

1. Every recent Mandrake upgrade has wiped the old raidtab. At best this 
is a nuisance, and at worst could be disastrous.

2. Likewise, every recent upgrade has ignored any old options in lilo 
and defaulted to kernels and initrds that will fail to boot from RAID. 
If you forget to add your own lilo options, you are stuffed.

3. There appears to be a problem with ext3 and software-RAID on 
2.4.18/19, that leads to filesystem corruption. ext-3 is now the 
Mandrake default, so anyone not aware of this (and it is an obscure 
problem) could be doing more harm than good by putting vital system 
files on an ext3 filesystem on RAID. Remember, RAID provides no 
protection from filesystem corruption.

4. diskdrake makes it easy to setup root on RAID. The very least that 
Mandrake could do, if they are going to stick with this system, is to 
prevent you putting / on RAID. Likewise, diskdrake does nothing to stop 
you putting /boot on any flavour of RAID, when the only flavour that 
will work (even with a purpose-built kernel) is RAID-1.

But rather than try to work with a broken system (which is what the 
perl-script on the errata page is trying to do - piling layer on layer 
of ugliness), the best solution would be to provide proper support for / 
on RAID. The simple way to do this would be to either build RAID (at 
least RAID-1) support into the default kernels, or provide alternative 
kernels with it built in. After all, this sort of thing is already done 
for SMP, enterprise, security etc. RAID is just as fundamental, and it 
is time Mandrake sorted it out.

Yours sincerely,

Bruno Prior



Reply via email to