Raid 5 problems - disk keeps failing, but no obvious errors

cprice Fri, 07 Apr 2000 09:29:12 -0700


        Hi;

        This one's a bit long-winded, so skip if you are not into a good
read...

        I am having some problems with my raid5 system:

        2.2.14 with raid-2.2.14-b1 from
                people.redhat.com/mingo/raid-patches

        tekram 390uw with 3 Seagate Medalist St39140W drives.

        Each disk sliced into 4 partitions: the sdX1 partitions make up
md0, the sdX2 partitions make up /dev/md1, the sdX3 partitions make up
/dev/md2 and the sdX4 partitions make up /dev/md3.


        Background:

        This system have been very stable for 8 months. At xmas, we
upgraded to 2.2.14 and the newest raid patches.

        Problem:

        on the /dev/md2 raid volume, the /dev/sdb3 partition keeps getting
marked as failed. This happens consistently after about 1.5-2 days of use.
/dev/md2 is where I mount /usr/local on this system, which houses apache
and openldap, so I don;t think it is a cron job, or burst of activity that
marks it bad.

        When /dev/sdb3 is marked bad, I unmount /usr/local, raidhotremove
/dev/sdb3 from /dev/md2 and then fsck /dev/sdb3. The drive fsck's fine, so
I raidhotadd /dev/sdb3 /dev/md2 and watch /proc/mdstat to make sure it is
rebuilding. /dev/md2 rebuilds itself marks everything clean ( ie:[UUU]),
and proceeds to run for a couple of days and then seems to get marked as
failed:

        md2 : active raid5 sdb3[0](F) sdd3[2] sdc3[1] 1435392 blocks level
5, 128k chunk, algorithm 2 [3/2] [_UU]


        I don't know what the issue is. My suspicion is that I have a
hardware issue with /dev/sdb, but before I go out and buy new drives to
replace these existing ones I wanted to know if anyone had the
same/similar issues.

        Thanks for reading so much detail...

        Cheers.

        Chris
Raid 5 problems - disk keeps failing, but no obvious errors

Reply via email to