James Manning <[EMAIL PROTECTED]>:
First start with some added information.
Ok. I use a Debian potato system. The kernel is 2.2.13 (compiled by
myself) without any patches. The raid tools are 0.42 (old style raid).
Appended you'll find the conf files for raid5. I use three scsi disks
with an almost identical layout (one of them is slightly bigger). Over
them I have a raid0 array and two raid5 arrays. I have a swap file on
the striped array (/var on /dev/md0). Here is the raid status:
================== /proc/mdstat ===========================
Personalities : [1 linear] [2 raid0] [3 raid1] [4 raid5]
read_ahead 128 sectors
md0 : inactive
md1 : active raid5 sda3 sdb3 sdc3 1542016 blocks level 5, 32k chunk, algorithm 2 [3/3]
[UUU]
md2 : active raid5 sda5 sdb5 sdc5 4706816 blocks level 5, 32k chunk, algorithm 2 [3/3]
[UUU]
md3 : active raid0 sda6 sdb6 sdc6 3405664 blocks 32k chunks
===========================================================
Now the problem: I occasionally get things like this:
Jan 12 11:14:39 pot kernel: raid5: bug: stripe->bh_new[2], sector 2622708 exists
Jan 12 11:14:39 pot kernel: raid5: bh ccc82440, bh_new c69107e0
Jan 24 11:21:49 pot kernel: raid5: bug: stripe->bh_new[0], sector 2622732 exists
Jan 24 11:21:49 pot kernel: raid5: bh cad26860, bh_new c1ec7e40
If I run ckraid on the raid5 devices, I get an average of ten messages
like (more or less): "array xxx corrupted, cannot reconstruct as all
devices are working". Running again ckraid on the same devices, the xxx
changes, so these errors are not reproducible in the same place.
This led me to think about hardware failure, but if I dd the disk
partitions to /dev/null I get no errors, nor do I get any errors using
badblocks, so the only thing left that I can think of is a software
failure.
I also thought about passing to new-style raid arrays, but:
- is the new style more reliable than the old one? I think not, it's
just that it can reconstruct in the background, right?
- is it possible to convert an old-style raid5 array to new style in
place? I think not, but I may be wrong.
Thank you for your help.
I also had a complete raid failure like this:
================================================================
Jan 20 04:10:25 pot kernel: RAID5: Disk failure on 08:23, disabling device.Operation
continuing on 2 devices
Jan 20 04:10:25 pot kernel: raid5: restarting stripe 3912183324
Jan 20 04:10:25 pot kernel: attempt to access beyond end of device
Jan 20 04:10:25 pot kernel: 08:03: rw=0, want=1956091663, limit=771120
Jan 20 04:10:25 pot kernel: dev 09:01 blksize=1024 blocknr=1956091662
sector=-382783972 size=1024 count=1
Jan 20 04:10:25 pot kernel: RAID5: Disk failure on 08:03, disabling device.Operation
continuing on 1 devices
Jan 20 04:10:25 pot kernel: attempt to access beyond end of device
Jan 20 04:10:25 pot kernel: 08:13: rw=0, want=1956091663, limit=771120
Jan 20 04:10:25 pot kernel: dev 09:01 blksize=1024 blocknr=1956091662
sector=-382783972 size=1024 count=1
Jan 20 04:10:25 pot kernel: RAID5: Disk failure on 08:13, disabling device.Operation
continuing on 0 devices
Jan 20 04:10:25 pot kernel: raid5: restarting stripe 3912183324
Jan 20 04:10:25 pot kernel: raid5: 09:01: unrecoverable I/O error for block 1764699694
Jan 20 04:10:25 pot kernel: md: updating raid superblock on device 08:03, sb_offset ==
771008
Jan 20 04:10:25 pot kernel: md: updating raid superblock on device 08:13, sb_offset ==
771008
Jan 20 04:10:25 pot kernel: md: updating raid superblock on device 08:23, sb_offset ==
771008
Jan 20 04:10:25 pot kernel: raid5: 09:01: unrecoverable I/O error for block 761869
Jan 20 04:10:25 pot kernel: EXT2-fs error (device md(9,1)): ext2_read_inode: unable to
read inode block - inode=189745, block=761869
Jan 20 04:10:25 pot kernel: raid5: 09:01: unrecoverable I/O error for block 974896
Jan 20 04:10:25 pot kernel: EXT2-fs error (device md(9,1)): ext2_read_inode: unable to
read inode block - inode=243068, block=974896
Jan 20 04:10:25 pot kernel: raid5: 09:01: unrecoverable I/O error for block 925716
Jan 20 04:10:25 pot kernel: EXT2-fs error (device md(9,1)): ext2_read_inode: unable to
read inode block - inode=230602, block=925716
Jan 20 04:10:25 pot kernel: raid5: 09:01: unrecoverable I/O error for block 589854
Jan 20 04:10:25 pot kernel: EXT2-fs error (device md(9,1)): ext2_read_inode: unable to
read inode block - inode=147048, block=589854
Jan 20 04:10:25 pot kernel: raid5: 09:01: unrecoverable I/O error for block 630796
Jan 20 04:10:25 pot kernel: EXT2-fs error (device md(9,1)): ext2_read_inode: unable to
read inode block - inode=157101, block=630796
Jan 20 04:10:25 pot kernel: raid5: 09:01: unrecoverable I/O error for block 950292
Jan 20 04:10:25 pot kernel: EXT2-fs error (device md(9,1)): ext2_read_inode: unable to
read inode block - inode=236722, block=950292
Jan 20 04:10:25 pot kernel: raid5: 09:01: unrecoverable I/O error for block 327715
Jan 20 04:10:25 pot kernel: EXT2-fs error (device md(9,1)): ext2_read_inode: unable to
read inode block - inode=81803, block=327715
Jan 20 04:10:25 pot kernel: raid5: 09:01: unrecoverable I/O error for block 265
Jan 20 04:10:25 pot kernel: raid5: 09:01: unrecoverable I/O error for block 265
Jan 20 04:10:25 pot kernel: raid5: 09:01: unrecoverable I/O error for block 1
Jan 20 04:10:26 pot kernel: raid5: 09:01: unrecoverable I/O error for block 1
Jan 20 04:10:27 pot kernel: raid5: 09:01: unrecoverable I/O error for block 265
Jan 20 04:10:27 pot kernel: EXT2-fs error (device md(9,1)): ext2_readdir: directory #2
contains a hole at offset 0
Jan 20 04:10:28 pot kernel: raid5: 09:01: unrecoverable I/O error for block 952424
Jan 20 04:10:28 pot kernel: raid5: 09:01: unrecoverable I/O error for block 1410074
Jan 20 04:10:28 pot kernel: raid5: 09:01: unrecoverable I/O error for block 1361517
Jan 20 04:10:28 pot kernel: raid5: 09:01: unrecoverable I/O error for block 747692
Jan 20 04:10:28 pot kernel: raid5: 09:01: unrecoverable I/O error for block 749844
Jan 20 04:10:28 pot kernel: raid5: 09:01: unrecoverable I/O error for block 752094
Jan 20 04:10:28 pot kernel: raid5: 09:01: unrecoverable I/O error for block 746753
Jan 20 04:10:28 pot kernel: raid5: 09:01: unrecoverable I/O error for block 753634
Jan 20 04:10:28 pot kernel: raid5: 09:01: unrecoverable I/O error for block 753638
Jan 20 04:10:28 pot kernel: raid5: 09:01: unrecoverable I/O error for block 753652
Jan 20 04:10:28 pot kernel: raid5: 09:01: unrecoverable I/O error for block 1452430
Jan 20 04:10:28 pot kernel: raid5: 09:01: unrecoverable I/O error for block 1464764
Jan 20 04:10:28 pot kernel: raid5: 09:01: unrecoverable I/O error for block 1450667
Jan 20 04:10:28 pot kernel: raid5: 09:01: unrecoverable I/O error for block 1461026
Jan 20 04:10:28 pot kernel: raid5: 09:01: unrecoverable I/O error for block 1463258
Jan 20 04:10:28 pot kernel: raid5: 09:01: unrecoverable I/O error for block 1471771
Jan 20 04:10:28 pot kernel: raid5: 09:01: unrecoverable I/O error for block 1459792
Jan 20 04:10:28 pot kernel: raid5: 09:01: unrecoverable I/O error for block 1476464
Jan 20 04:10:28 pot kernel: raid5: 09:01: unrecoverable I/O error for block 1452430
Jan 20 04:10:28 pot kernel: raid5: 09:01: unrecoverable I/O error for block 1464764
Jan 20 04:10:28 pot kernel: raid5: 09:01: unrecoverable I/O error for block 1450667
Jan 20 04:10:28 pot kernel: raid5: 09:01: unrecoverable I/O error for block 1461026
Jan 20 04:10:28 pot kernel: raid5: 09:01: unrecoverable I/O error for block 1463258
Jan 20 04:10:28 pot kernel: raid5: 09:01: unrecoverable I/O error for block 1471771
Jan 20 04:10:28 pot kernel: raid5: 09:01: unrecoverable I/O error for block 1459792
Jan 20 04:10:28 pot kernel: raid5: 09:01: unrecoverable I/O error for block 1476464
Jan 20 04:10:28 pot kernel: raid5: 09:01: unrecoverable I/O error for block 1452430
Jan 20 04:10:28 pot kernel: raid5: 09:01: unrecoverable I/O error for block 1464764
Jan 20 04:10:28 pot kernel: raid5: 09:01: unrecoverable I/O error for block 1450667
Jan 20 04:10:28 pot kernel: raid5: 09:01: unrecoverable I/O error for block 1461026
Jan 20 04:10:28 pot kernel: raid5: 09:01: unrecoverable I/O error for block 1463258
Jan 20 04:10:28 pot kernel: raid5: 09:01: unrecoverable I/O error for block 1471771
Jan 20 04:10:28 pot kernel: raid5: 09:01: unrecoverable I/O error for block 1459792
Jan 20 04:10:28 pot kernel: raid5: 09:01: unrecoverable I/O error for block 1476464
Jan 20 04:10:28 pot kernel: raid5: 09:01: unrecoverable I/O error for block 265
Jan 20 04:10:28 pot kernel: EXT2-fs error (device md(9,1)): ext2_readdir: directory #2
contains a hole at offset 0
Jan 20 04:10:28 pot kernel: raid5: 09:01: unrecoverable I/O error for block 265
...
(the last two lines repeated forever until the system completely froze)
================================================================
======================= scsi layout =======================
Disk /dev/sda: 255 heads, 63 sectors, 527 cylinders
Units = cylinders of 16065 * 512 bytes
Device Boot Start End Blocks Id System
/dev/sda1 103 527 3413812+ 5 Extended
/dev/sda2 * 1 6 48163+ 83 Linux
/dev/sda3 7 102 771120 83 Linux
/dev/sda5 103 395 2353491 83 Linux
/dev/sda6 396 527 1060258+ 83 Linux
Disk /dev/sdb: 255 heads, 63 sectors, 527 cylinders
Units = cylinders of 16065 * 512 bytes
Device Boot Start End Blocks Id System
/dev/sdb1 103 527 3413812+ 5 Extended
/dev/sdb2 * 1 6 48163+ 83 Linux
/dev/sdb3 7 102 771120 83 Linux
/dev/sdb5 103 395 2353491 83 Linux
/dev/sdb6 396 527 1060258+ 83 Linux
Disk /dev/sdc: 255 heads, 63 sectors, 555 cylinders
Units = cylinders of 16065 * 512 bytes
Device Boot Start End Blocks Id System
/dev/sdc1 103 555 3638722+ 5 Extended
/dev/sdc2 1 6 48163+ 82 Linux swap
/dev/sdc3 7 102 771120 83 Linux
/dev/sdc5 103 395 2353491 83 Linux
/dev/sdc6 396 555 1285168+ 83 Linux
============================================================
===File /etc/raid/home.conf=================================
# raid-5 for /home on /dev/md2
raiddev /dev/md2
raid-level 5
nr-raid-disks 3
nr-spare-disks 0
chunk-size 32
parity-algorithm left-symmetric
device /dev/scsi/id2p5
raid-disk 0
device /dev/scsi/id3p5
raid-disk 1
device /dev/scsi/id9p5
raid-disk 2
============================================================
===File /etc/raid/usr.conf==================================
# raid-5 for /usr on /dev/md1
raiddev /dev/md1
raid-level 5
nr-raid-disks 3
nr-spare-disks 0
chunk-size 32
parity-algorithm left-symmetric
device /dev/scsi/id2p3
raid-disk 0
device /dev/scsi/id3p3
raid-disk 1
device /dev/scsi/id9p3
raid-disk 2
============================================================