> Basically I've got a ASUS A7V133 RAID m/b with 1G ram, althon 1500XP and > two 40G > 7200rpm disks. One disk on the main controller, the other on the > onboard promise controller. > > Booted mandrake 9.0 disk #1, partitioned as follows: > > /dev/hda1 256M - /boot [ext2] > /dev/hda2 24G - added to md0 > /dev/hda3 1G swap > /dev/hds4 reat - addes to md1 > > /dev/hde1 256M - /tmp [reiserfs] > /dev/hde2 24G - added to md0 > /dev/hde3 1G swap > /dev/hde4 reat - addes to md1 > > md0 is mode 0 / > md1 is mode 1 /home > > Installed mandrake, went fine, rebooted, failed !
I've been using Mandrake for a while (since the 7.x versions), and I am a huge admirer. However, Mandrake's implementation of software-RAID is, and has been as long as I have used it, complete pants! It is, by a long chalk, the worst thing about the distro. Software-RAID has been fantastically simple since Ingo Molnar wrote the new raidtools several years ago. Auto-recognition makes everything very easy. But as far as I can tell, Mandrake have yet to get to grips with auto-recognition. It seems that they are still trying to fire up the arrays from rc.sysinit. This is presumably because they have always left RAID support as modules, rather than building it into the default kernels. This is OK, so long as none of the files involved in booting are on any of your arrays. As soon as you try something like root-RAID, it all falls down. Or at least, this is my interpretation of why every time I upgrade my root-RAID Mandrake-based server, I have to remember to leave behind a purpose-built kernel and add an extra entry to lilo.conf to point to it. If you forget to do this, you are stuffed. Every version of Mandrake I have used has kernel-paniced on bootup when using the default kernels and initrds on my root-RAID system. As soon as you use a kernel with RAID-support built in, it boots fine. This says to me that Mandrake does not support root-RAID. If I were setting up a root-RAID system based on Mandrake, I would do the following (I am assuming that this is on RAID-1 or -5, it's not a good plan to put / on RAID-0 anyway): Install / originally to one of the partitions that will be in the root-RAID. Leave the other partitions in the array unassigned for now. Having rebooted after installation, rebuild the kernel with whichever RAID levels you need built in (and anything else that would otherwise have to be in the initrd). Do the usual make menuconfig; make dep; make clean; make bzImage; make modules; make modules_install; cp boot/arch/i386/bzImage /boot etc. Edit lilo.conf to add an extra option to boot from this kernel. Lose the initrd line for this option, as you shouldn't need it. Run lilo and then reboot, selecting this option, to check that the new kernel boots OK. Setup the array that will hold /, with the partition that is currently / set as "failed-disk" in your raidtab. This should setup the array in degraded mode. You will want to make sure the partitions in the array other than the "failed-disk" partition are set to type "fd" for RAID auto-recognition before building the array. Having built the array, check /proc/mdstat to see that everything worked out. If so, reboot again, to test whether the new kernel will automatically recognise and start the array. Build the filesystem on the array, mount the array at a temporary mount-point (e.g. /mnt/disk), and copy all the files on / to the array, making sure not to include files from other partitions (tar is good for the job). Edit the copy of lilo.conf on the array to add an extra option that has root on /dev/mdX (where X is the number of this array), rather than on /dev/hd??. Run e.g. "lilo -r /mnt/disk" to install the new boot option. Edit /mnt/disk/etc/fstab to change the root device from /dev/hd?? to /dev/mdX. Reboot and select the option to boot with the RAID as root from the lilo prompt. If this boots successfully, you have a working root-RAID system, and don't need the partition that you originally installed to (the failed-disk). Make sure this partition is not mounted anywhere. Change its partition type to "fd". Edit /etc/raidtab so that this partition is now a normal "raid-disk", rather than a "failed-disk". raidhotadd the partition to the array. Watch the resync progress in /proc/mdstat. When it has completed, reboot again. You probably want to make the root-RAID option in lilo the default option, so you don't have to remember to select it when you boot. You will also want to get rid of the lilo options that point to the original partition for /, as the data on this partition has been destroyed. If you are using RAID-1 (which is much the better option for root-RAID), you can actually leave these options, as the individual mirrors in a RAID-1 can be mounted in their own right, so these boot options should still work, and could prove useful for recovering if something goes wrong when rebooting now. With RAID-5, you can no longer boot from the original partition. Although the arrays should be started before rc.sysinit even kicks in, so the RAID part of that script should have no impact, I strongly suggest commenting out the RAID section, as it can still have an effect if you are, say, testing an array. You might want to build an array (in which case you will have described it in /etc/raidtab), but not want to start it automatically at boot. With auto-recognition, all you do is leave the partition types set to "83" or whatever is appropriate. But the Mandrake rc.sysinit will try to start any array in /etc/raidtab, whether you want it or not, so get rid of it. All of which is a very tedious process, which could be avoided if Mandrake had half a clue about software-RAID. 1. Every recent Mandrake upgrade has wiped the old raidtab. At best this is a nuisance, and at worst could be disastrous. 2. Likewise, every recent upgrade has ignored any old options in lilo and defaulted to kernels and initrds that will fail to boot from RAID. If you forget to add your own lilo options, you are stuffed. 3. There appears to be a problem with ext3 and software-RAID on 2.4.18/19, that leads to filesystem corruption. ext-3 is now the Mandrake default, so anyone not aware of this (and it is an obscure problem) could be doing more harm than good by putting vital system files on an ext3 filesystem on RAID. Remember, RAID provides no protection from filesystem corruption. 4. diskdrake makes it easy to setup root on RAID. The very least that Mandrake could do, if they are going to stick with this system, is to prevent you putting / on RAID. Likewise, diskdrake does nothing to stop you putting /boot on any flavour of RAID, when the only flavour that will work (even with a purpose-built kernel) is RAID-1. But rather than try to work with a broken system (which is what the perl-script on the errata page is trying to do - piling layer on layer of ugliness), the best solution would be to provide proper support for / on RAID. The simple way to do this would be to either build RAID (at least RAID-1) support into the default kernels, or provide alternative kernels with it built in. After all, this sort of thing is already done for SMP, enterprise, security etc. RAID is just as fundamental, and it is time Mandrake sorted it out. Yours sincerely, Bruno Prior
