Thomas Backlund wrote:
> Funny, I have no problem what so ever with the soft raid, and I have 3
> systems
> set up this way, 2 with scsi disks, and one with ide disks...
>
> my setup: RAID -1
>
> /dev/md0 -> /boot (sda1,sdb1 or hda1,hdc1)
> /dev/md1 -> / (sda6,sdb6 or hda6,hdc6)
> /dev/md2 -> /usr (sda7,sdb7 or hda7,hdc7)
> /dev/md3 -> /var (sda8,sdb8 or hda8,hdc8)
> /dev/md4 -> /home (sda9,sdb9 or hda9,hdc9)
>
> the swaps are on /dev/sda5 and /dev/sdb5 (or /dev/hda5 and /dev/hdc5)
> (the swaps could also be on raid, but I haven't felt the need to
place them
> there...)
Interesting. I wonder what the difference is. My setup is:
/dev/md0 -> /boot (hda1 + hdc1, RAID-1)
/dev/md1 -> / (hda2 + hde2, RAID-1)
/dev/md2 -> /home (hda3 + hdc3 + hde3 + hdg3, RAID-5)
/dev/md3 -> /usr (hda4 + hdc4 + hde5 + hdg4, RAID-5)
I too intend to go with swap on RAID-1 (which is why there are a few
missing partitions above), but for the timebeing swap is on hdg5. I
personally believe the logic is the same for swap on RAID as for root on
RAID. If you are using RAID for High-Availability, you want to prevent
your machine crashing if a disk fails. This will happen if the disk dies
that carries either / or swap. Putting them both on RAID should prevent
this happening, which is why I personally believe there is a need to put
swap on RAID.
All in all our setups look quite similar, which makes this all the more
confusing. I don't think it should make much difference that /home and
/usr are on RAID-5, it's / and /boot that count. Nor should it make any
difference that my / partitions are primary, while yours are extended.
One thought. Was yours a completely fresh install? Mine was an install
over the top of 9.0 RC1, and I was trying in the first instance to
retain the old arrays. The errata indicates that this is what goes wrong
- diskdrake identifies the old arrays correctly, but does not
successfully recreate the raidtab. Because Mandrake's raid startup
depends on the raidtab, everything then falls down. The perl-script
attached to the errata rebuilds the raidtab, which is a kludge to get
round the problem (but probably too late by the time you encounter the
problem and find the solution). But the main point of my previous
message was that this is the wrong way to do it. It is _much_ better for
your arrays to fire up under auto-recognition, than to do it via raidtab
and rc.sysinit. (a) because you want your arrays fired up as early as
possible, and (b) because you do not necessarily want the system to try
to follow your raidtab - it might have been corrupted (as in this case),
or you might be experimenting with arrays, failed-disks etc. in which
case you may want some of the arrays in raidtab to start when you
raidstart them, and not necessarily on every boot.
And another thought. What type of filesystem are you using for / and
/boot? I have a suspicion that the move to ext3 has made things worse.
Could there be some conflict between the ext3 and RAID superblocks,
and/or could the presence of 2 superblocks confuse lilo?
> and btw. take a look at lilo docs
> (/usr/share/doc/lilo-22.3.2/README.raid1.bz2)
>
> at the beginning it states:
>
> RESTRICTIONS
> ============
>
> Only RAID1 is supported. LILO may be used to boot a system
> containing other RAID level partitions, but it may not be installed
> on any RAID partition other than RAID 1.
Exactly. Which is why I said:
> you putting / on RAID. Likewise, diskdrake does nothing to stop
> you putting /boot on any flavour of RAID, when the only flavour that
> will work (even with a purpose-built kernel) is RAID-1.
Maybe I wasn't explicit, but I was assuming the boot image and initrd
are in /boot. This is a fairly obscure feature, although obvious when
one thinks about it (lilo doesn't remotely have space to handle disk
striping). So, it would be a good idea for diskdrake either to warn
about this or prevent people from putting /boot (or / if they do not
have a separate /boot partition) on RAID-0 or RAID-5.
> Thomas
One interesting aspect of the problem I had with 9.0, was that I could
not even boot using one of the partitions in the root array as /. I
booted with the rescue disk and mounted / on /mnt. I edited
/mnt/etc/lilo.conf to use hda2 as root (and ran "lilo -r /mnt"). I
edited /mnt/etc/fstab so that / was on hda2 and /boot was on hda1, both
as ext2 filesystems. I changed the type of the partitions in md0 and md1
so that they would not be auto-recognised. Even then, with only /usr and
/home on RAID, the system failed to boot, with the same error message
("Kernel panic: No init found"), which I believe indicates that / failed
to mount. This would have worked previously. My strong suspicion is that
this now fails because of the move to ext3 (the panic was preceded by
the messages "Mounting root filesystem\n EXT3-fs: unable to read
superblock").
The question is, is it premature to move to ext3 as the default for /?
Given the filesystem corruption issues with recent kernels, RAID and
ext3, plus the problems mentioned above, I believe it is premature, and
ext2 should still be the default for /. That's the way I have now
configured my system, and it is now behaving itself again.
Cheers,
Bruno