Hi Bruno!
Just into the blue:
1. You should use 2.2.19. Some security risks have vanished with that
release.
2. Do you use the ide-patch? I need it for my Promise 66 card in a
Via/Duron system otherwise equal to yours. Works fine with DMA (via
hdparm) also.
I use Debian 2.2rev3 with a handmade kernel.
Only pain in ass with debian is that one cannot install directly onto a
raid, but has to get patches, apply them and compile a kernel first.
By the way: I always install to a partition which will later be one
partition of a software raid used for a non-root mountpoint, say /home.
When everything is done, I make a raid1 (which is bootable, raid5 at
keast didn�t use to be), teet it (also the autodetect function), mount
it, move root there, change lilo, and boot onto the raid. I have a raid1
for /boot and a raid1 for root. The rest is a big raid5 which is mounted
somewhere else..
Bruno Prior wrote:
>
> I have been running software RAID for several years without trouble.
> Recently, one of my hard disks in my home server died. As the others
> were getting pretty noisy too (they were 4 X 17Gb Maxtors, which I learn
> have a bad reputation), I decided to play safe and replace them all. I
> also decided to make the most of the opportunity to go for larger
> capacity disks. I purchased 4 X 40Gb Fujitsu MPG3409AT disks, which are
> wonderfully quiet compared to the Maxtors.
>
> Unfortunately, the BIOS on my old motherboard was stuck
> on the 33.8Gb limit, and I could not persuade the system to boot. So I
> decided to upgrade the motherboard too. My existing setup had the first
> 2 disks on the 2 IDE channels on the motherboard, and the other 2 disks
> on a Promise ATA-33 controller. As the Fujitsu disks are capable of
> ATA-100, I decided to look for a system that would allow me to run them
> at a faster speed (although I appreciate that this probably would not
> make much difference with 1 disk per channel). I went for a Gigabyte
> GA-7ZXR, which has a Promise controller allowing RAID (under Windows) or
> ATA-100 on 2 extra IDE channels, as well as the standard IDE channels.
>
> With a new motherboard and disks, I effectively had a new system, so I
> decided to install from scratch rather than try to transfer the old
> system across. I went for Mandrake 7.2, because that distro has had the
> most success detecting my hardware in the past. Unfortunately, it did
> not successfully detect the Promise controller, so only the first 2
> disks showed up. I tried the advice for older Promise controllers in the
> Ultra-DMA mini-HOWTO, passing parameters for ide2 and ide3 to the kernel
> at bootup. This seemed to work, as all disks were then detected.
>
> Initially, I thought I would take the easy option and use Diskdrake to
> install straight to RAID. However, this resulted in frustration, because
> of Mandrake's dependence on the brain-dead Red Hat scheme, where
> software RAIDs are started from rc.sysinit and stopped in halt. This
> does not work for root RAID, and why anyone would choose to do this with
> scripts when autodetect works so well is beyond me. The result was a
> failure to correctly stop the root RAID at shutdown, and therefore a
> resync every time I rebooted.
>
> So I decided to do things manually. I installed the system to the first
> disk, and defined that as a failed-disk in my raidtab when creating the
> 4-disk RAID-5. The raid device was created fine. I created a filesystem
> on it with "mke2fs -b 4096 -R stride=16 /dev/md0". No complaints.
> Mounted it to /mnt/disk. Copied my root filesystem onto it with "cd /;
> find . -xdev | cpio -pm /mnt/disk". No problems. "ls /mnt/disk" seemed
> happy. But if I then rebooted, the raid started fine, but e2fsck found
> so many errors on /dev/md0 that the boot sequence failed. I could force
> the same effect by creating the raid, mounting it, copying the files,
> unmounting, running e2fsck (reports clean), then mounting it again,
> where, even though e2fsck had reported the system clean immediately
> beforehand, I would receive an endless stream of errors while it was
> trying to mount the device.
>
> If I only copy a small filesystem (say /root) to the device, I do not
> get these problems. I have not identified the threshold size of
> filesystem that provokes this corruption.
>
> I thought the problems might be due to the onboard Promise controller.
> So I disabled it, and reinstalled my old Promise ATA-33 card, which had
> worked perfectly for 2 years with the 17Gb disks. This seemed to work
> fine (apart from the card's BIOS thinking that the disks were only 8Gb
> in capacity, which had no effect on linux's ability to correctly
> identify their capacity) and was detected from scratch. However, I
> experience the same corruption in the same circumstances with this card
> as with the onboard controller.
>
> The error messages are endless, but a few examples are:
>
> When mounting:
>
> EXT2-fs error (device md(9,0)): ext2_check_blocks_bitmap: Wrong free
> blocks count for group20, stored=32217, counted=32220
>
> EXT2-fs error (device md(9,0)): ext2_check_blocks_bitmap: Wrong free
> blocks count in super block, stored=29189883, counted=29191024
>
> EXT2-fs error (device md(9,0)): ext2_check_inodes_bitmap: Wrong free
> inodes count in group20 stored=16359, counted=16362
>
> When running e2fsck:
>
> /dev/md0 contains a file system with errors, check forced.
> Pass1: Checking inodes, blocks, and sizes
> Inode 901168 has illegal block(s). Clear <y>?
> [I press y]
> Illegal block #12 (3067540912) in inodes 901168. CLEARED
> [repeated many times for different blocks]
> Too many illegal blocks in inode 901168
> Clear inode <y>?
>
> Inode 950308 is in use, but has dtime set. Fix <y>?
>
> Special [device/socket/fifo] inode 950332 has non-zero size. Fix <y>?
>
> Inode 950336 has imagic flag set. Clear <y>?
>
> Inode 950384, i_blocks is 8519744, should be 144. Fix <y>?
>
> And many other similar messages. I have never had the patience to wade
> through them all to the end.
>
> A few other details:
>
> Mandrake 7.2 uses kernel 2.2.17 with the Mandrake patch set. This
> includes the software-RAID patch. I have recompiled the kernel to build
> in RAID-1 and -5 support, which are modular in the default installation.
> Mandrake also comes with the new raidtools (v0.90). I don't think there
> is any issue of the raid support being out of date or incompatible with
> the raidtools. I have commented out the mad raid sections in rc.sysinit
> and halt.
>
> My raidtab for md0 is:
>
> raiddev /dev/md0
> raid-level 5
> nr-raid-disks 4
> nr-spare-disks 0
> persistent-superblock 1
> parity-algorithm left-symmetric
> chunk-size 64
>
> device /dev/hdc3
> raid-disk 0
> device /dev/hde2
> raid-disk 1
> device /dev/hdg2
> raid-disk 2
> device /dev/hda3
> failed-disk 3
>
> The output of fdisk -l is:
>
> Disk /dev/hda: 255 heads, 63 sectors, 4983 cylinders
> Units = cylinders of 16065 * 512 bytes
>
> Device Boot Start End Blocks Id System
> /dev/hda1 * 1 2 16033+ 83 Linux
> /dev/hda2 3 25 184747+ 82 Linux swap
> /dev/hda3 26 4983 39825135 85 Linux extended
> /dev/hda5 26 4983 39825103+ 83 Linux
>
> Disk /dev/hdc: 255 heads, 63 sectors, 4983 cylinders
> Units = cylinders of 16065 * 512 bytes
>
> Device Boot Start End Blocks Id System
> /dev/hdc1 * 1 2 16033+ 83 Linux
> /dev/hdc2 3 25 184747+ 82 Linux swap
> /dev/hdc3 26 4983 39825135 83 Linux
>
> Disk /dev/hde: 16 heads, 63 sectors, 79428 cylinders
> Units = cylinders of 1008 * 512 bytes
>
> Device Boot Start End Blocks Id System
> /dev/hde1 1 410 206608+ 83 Linux
> /dev/hde2 411 79428 39825072 83 Linux
>
> Disk /dev/hdg: 16 heads, 63 sectors, 79428 cylinders
> Units = cylinders of 1008 * 512 bytes
>
> Device Boot Start End Blocks Id System
> /dev/hdg1 1 410 206608+ 83 Linux
> /dev/hdg2 411 79428 39825072 83 Linux
>
> (I will set the type to fd to autostart, but for the sake of testing I
> was starting and stopping the raid manually, so a failure at boot time
> doesn't hang the system)
>
> Output of df:
>
> Filesystem 1k-blocks Used Available Use% Mounted on
> /dev/hda5 39199884 32040384 5168248 86% /
> /dev/hdc1 15522 1 14720 0% /bakboot
> /dev/hda1 15522 2489 12232 17% /boot
>
> The reason I want to combine most of each disk into one large (~120Gb)
> RAID-5 is that I mostly use my home server for storing audio, and
> potentially in future, video files, and I would like them all to exist
> within one filesystem structure. I don't want to make arbitrary
> judgements about how much space I will need in different areas of my
> filesystem in the future.
>
> Does anyone have any ideas what could be causing this corruption? My
> first instinct was to blame the controller, but I think that theory is
> out of the window as I have the same problem with a controller that has
> worked fine for 2 years. Does linux have problems with filesystems this
> size? Could it be related to the fact that the raid is in degraded mode
> (because hda is defined as a failed-disk)? Or could it be anything to do
> with the fact that linux assigns different drive geometries to the first
> two and the last two disks, or that the partition sizes are not exactly
> equal (I don't think this should matter)?
>
> Sorry for the long message, but I thought the background might provide
> useful clues.
>
> Cheers,
>
> Bruno Prior
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to [EMAIL PROTECTED]
--
--
Norman Schmidt Universitaet Erlangen-Nuernberg
cand.chem. Sysadmin Wohnheimnetzwerk RatNET
mailto:[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]