Dale,

        I've messed with this quite a bit as well (Mandrake 8.0, 2.4.3 kernel,
RAID1 using two or three disks), and I can not get LILO bootloaders to
reliably boot off an md device if the first disk in the array is
unavailable.  I'm sure there are people on this list that have it
working, and I know the developers of this capability are here and can
comment on this, but it has not worked reliably for me.

        So...   I went for something more simple (IMHO) than asking LILO and md
to take care of this for me: avoid booting off the md device by writing
slightly different bootloaders out to the MBR on each sd device in the
array, as such...

- Make one copy of /etc/lilo.conf for each disk, i.e., lilo-sda.conf,
  lilo-sdb.conf, etc.

- Edit each copy as such:
  - Make the "boot=" line match the device, i.e., boot=/dev/sda,
    boot=/dev/sdb, etc.
  - Keep a different map file for each bootloader by editing the
    "map=" entry in each file, i.e., map=map-sda, map=map-sdb, etc.
    Remember that map files are kept in /boot.

- Write a boot loader to each disk, i.e., "lilo -C /etc/lilo-sda.conf",
  "lilo -C /etc/lilo-sdb.conf", etc.

        Now, the box can correctly boot off *any* disk in the array,
without relying on the md driver, and then proceed to mount the raid
partitions (possibly in degraded mode, if one or more disks are gone).

        Caveat: I've only done all this with a raid-1 set, and have no
first-hand
knowledge for other raid configs.

        Hope that helps!

-Al



Dale LaFountain wrote:
> 
> I've read the FAQ, the Howtos, and innumerable other online
> resources, but I haven't been able to nail down this problem, so I
> hope someone out there isn't on holiday vacation and can lend a hand.
> I'm trying to finish configuring this box so I can drop it into a
> colocate on Thursday (I know, cutting it close).
> 
> I have a raid1 that was configured by the Redhat GUI installer on two
> 9GB scsi drives.  I'm able to boot from this raid (md0-2) without
> problems, under normal conditions.
> 
> However, I'm only able to boot if both members of md0 are online
> (sda1 and sdb1).  If either member is offline (they are sca drives in
> removable sleds, thus easily taken offline), I get stuck at "LI" on
> boot.
> 
> This doesn't do me much good, because in this configuration I'm
> toasted if either member of my boot raid goes offline.  Twice the
> risk for only twice the price, great... :/
> 
> I could really use some help with this one.  The solution seems very
> close to what I have now, but I sure can't find it...  I've been
> beating on this for about 12 hours straight and am getting nowhere
> fast.
> 
> Redhat 7.0, kernel 2.2.16-22enterprise, dual P3-550, onboard Adaptec
> AIC-7896 U2W, SCA backplane with 4 sled bays.
> 
> --- raidtab: ---
> raiddev             /dev/md0
> raid-level                  1
> nr-raid-disks               2
> chunk-size                  64k
> persistent-superblock       1
> #nr-spare-disks     0
>      device          /dev/sda1
>      raid-disk     0
>      device          /dev/sdb1
>      raid-disk     1
> 
> raiddev             /dev/md1
> raid-level                  1
> nr-raid-disks               2
> chunk-size                  64k
> persistent-superblock       1
> #nr-spare-disks     0
>      device          /dev/sdc1
>      raid-disk     0
>      device          /dev/sdd1
>      raid-disk     1
> 
> raiddev             /dev/md2
> raid-level                  1
> nr-raid-disks               2
> chunk-size                  64k
> persistent-superblock       1
> #nr-spare-disks     0
>      device          /dev/sda5
>      raid-disk     0
>      device          /dev/sdb5
>      raid-disk     1
> 
> I started with a much simpler lilo file, but after reading tons of
> other posts, howtos, and mini-howtos, I added a few more commands
> that, while they didn't seem to hurt, also don't appear to solve my
> problem.
> 
> How can I tell what the proper "start=" parameter should be?  Most
> examples I saw showed it being the same as sectors, but then I saw
> one that was one less than the cylinder count...  The SHC values were
> pulled from fdisk -ul /dev/sda as described in the boot+raid howto.
> 
> Some of the docs suggest that partition is required, but must be a
> dummy value.  Is this still the case with lilo 21.4-4 (the version
> I'm using)?
> 
> Is an "other" image still required?  I'm guessing not, since I don't
> see any complaints about it's absence.
> 
> I read about setting a default value solving the "LI" problem, but it
> didn't work for me.
> 
> --- lilo.conf.sda: ---
> disk=/dev/md0
> bios=0x80
> sectors=63
> heads=255
> cylinders=254
> partition=/dev/md4
> start=63
> boot=/dev/sda
> lba32
> map=/boot/map
> install=/boot/boot.b
> timeout=50
> message=/boot/message
> delay=30
> default=linux
> 
> image=/boot/vmlinuz-2.2.16-22enterprise
>          label=linux
>          initrd=/boot/initrd-2.2.16-22enterprise.img
>          read-only
>          root=/dev/md0
> 
> --- lilo.conf.sdb: ---
> disk=/dev/md0
> bios=0x80       (should this be 0x81 for modern hardware, less than 1 yr old?)
> sectors=63
> heads=255
> cylinders=254
> partition=/dev/md4
> start=63
> boot=/dev/sdb
> lba32
> map=/boot/map
> install=/boot/boot.b
> prompt
> timeout=50
> message=/boot/message
> delay=30
> default=linux
> 
> image=/boot/vmlinuz-2.2.16-22enterprise
>          label=linux
>          initrd=/boot/initrd-2.2.16-22enterprise.img
>          read-only
>          root=/dev/md0
> 
> Both lilo's were properly committed using 'lilo -C lilo.conf.sd[a,b]'
> 
> --- cat /proc/mdstat ---
> Personalities : [raid1]
> read_ahead 1024 sectors
> md0 : active raid1 sdb1[0] sda1[1] 2048192 blocks [2/2] [UU]
> md2 : active raid1 sdb5[1] sda5[0] 4787264 blocks [2/2] [UU]
> md1 : active raid1 sdd1[1] sdc1[0] 17775808 blocks [2/2] [UU]
> unused devices: <none>
> 
> --- fstab (truncated) ---
> /dev/md0 /     ext2  defaults  1 1
> /dev/md1 /home ext2  defaults  1 2
> /dev/md2 /var  ext2  defaults  1 2
> 
> Based on the above, what do I need to change in order to gracefully
> recover from a disk failure?
> 
> Any assistance would be greatly appreciated.
> 
> Thanks,
> 
> Dale
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]

Reply via email to