On Sat, Apr 17, 2010 at 3:01 PM, Neil Bothwick <[email protected]> wrote:
> On Sat, 17 Apr 2010 14:36:39 -0700, Mark Knecht wrote:
>
>> Empirically any way there doesn't seem to be a problem. I built the
>> new kernel and it booted normally so I think I'm misinterpreting what
>> was written in the Wiki or the Wiki is wrong.
>
> As long as /boot is not on RAID, or is on RAID1, you don't need an
> initrd. I've been booting this system for years with / on RAID1 and
> everything else on RAID5.
>
>
> --
> Neil Bothwick
Neil,
Completely agreed, and in fact it's the way I built my new system.
/boot is just a partition, / is RAID1 is three partitions marked with
0xfd partition type, using metadata=0.90 and assembled by the kernel.
I'm using WD RAID Edition drives and an Asus Rampage II Extreme
motherboard.
It works, however I'm running into the sort of thing I ran into
this morning when booting - both md5 and md6 have problems this
morning. Random partitions get dropped out. It's never the same ones,
and it's sometimes only 1 partition out of three on the same drive -
sdc5 and sdc6 aren't found until I reboot, but sda3, sdb3 & sdc3 were.
Flakey hardware? What? The motherboard? The drives?
I've noticed the entering the BIOS setup screens before allowing
grub to take over seems to eliminate the problem. Timing?
m...@c2stable ~ $ cat /proc/mdstat
Personalities : [raid0] [raid1]
md6 : active raid1 sda6[0] sdb6[1]
247416933 blocks super 1.1 [3/2] [UU_]
md11 : active raid0 sdd1[0] sde1[1]
104871936 blocks super 1.1 512k chunks
md3 : active raid1 sdc3[2] sdb3[1] sda3[0]
52436096 blocks [3/3] [UUU]
md5 : active raid1 sdb5[1] sda5[0]
52436032 blocks [3/2] [UU_]
unused devices: <none>
m...@c2stable ~ $
For clarity, md3 is the only one needed to boot the system. The
other three RAIDs aren't required until I start running apps. However
they are all being assembled by the kernel at boot time and I would
prefer not to do that, or at least learn how not to do it.
Now, as to why they are being assembled I suspect it's because I
marked them all with partition type 0xfd when possibly it's not the
best thing to have done. The kernel won't bother with non-0xfd
partitions and then mdadm could have done it later:
c2stable ~ # fdisk -l /dev/sda
Disk /dev/sda: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x8b45be24
Device Boot Start End Blocks Id System
/dev/sda1 * 1 7 56196 83 Linux
/dev/sda2 8 530 4200997+ 82 Linux swap / Solaris
/dev/sda3 536 7063 52436160 fd Linux raid autodetect
/dev/sda4 7064 60801 431650485 5 Extended
/dev/sda5 7064 13591 52436128+ fd Linux raid autodetect
/dev/sda6 30000 60801 247417065 fd Linux raid autodetect
c2stable ~ #
However the Gentoo Wiki says we are supposed to mark everything 0xfd:
http://en.gentoo-wiki.com/wiki/RAID/Software#Setup_Partitions
I'm not sure that we good advice or not for RAIDs that could be
assembled later but that's what I did and it leads to the kernel
trying to do everything before the system is totally up and mdadm is
really running.
Anyway, the failures happen, so I can step through and fail, remove
and add the partition back to the array. (In this case fail and remove
aren't necessary)
c2stable ~ # mdadm /dev/md5 -f /dev/sdc5
mdadm: set device faulty failed for /dev/sdc5: No such device
c2stable ~ # mdadm /dev/md5 -r /dev/sdc5
mdadm: hot remove failed for /dev/sdc5: No such device or address
c2stable ~ # mdadm /dev/md5 -a /dev/sdc5
mdadm: re-added /dev/sdc5
c2stable ~ # mdadm /dev/md6 -a /dev/sdc6
mdadm: re-added /dev/sdc6
c2stable ~ #
At this point md5 is repaired and I'm waiting for md6
c2stable ~ # cat /proc/mdstat
Personalities : [raid0] [raid1]
md6 : active raid1 sdc6[2] sda6[0] sdb6[1]
247416933 blocks super 1.1 [3/2] [UU_]
[====>................] recovery = 22.0% (54525440/247416933)
finish=38.1min speed=84230K/sec
md11 : active raid0 sdd1[0] sde1[1]
104871936 blocks super 1.1 512k chunks
md3 : active raid1 sdc3[2] sdb3[1] sda3[0]
52436096 blocks [3/3] [UUU]
md5 : active raid1 sdc5[2] sdb5[1] sda5[0]
52436032 blocks [3/3] [UUU]
unused devices: <none>
c2stable ~ #c2stable ~ # cat /proc/mdstat
Personalities : [raid0] [raid1]
md6 : active raid1 sdc6[2] sda6[0] sdb6[1]
247416933 blocks super 1.1 [3/2] [UU_]
[====>................] recovery = 22.0% (54525440/247416933)
finish=38.1min speed=84230K/sec
md11 : active raid0 sdd1[0] sde1[1]
104871936 blocks super 1.1 512k chunks
md3 : active raid1 sdc3[2] sdb3[1] sda3[0]
52436096 blocks [3/3] [UUU]
md5 : active raid1 sdc5[2] sdb5[1] sda5[0]
52436032 blocks [3/3] [UUU]
unused devices: <none>
c2stable ~ #
How do I get past this? It's happening 2-3 times a week! I'm
figuring if the kernel doesn't auto-assemble the RAIDs that I don't
need assembled then I can somehow check that all the partitions are
ready to go before I start them up. This exercise this morning will
have taken an hour before I can start using the machine.
- Mark
- Mark