Hi there

I've been running a raid5-set on a medium loaded server very reliably for
a couple of months, setup like this:

* redhat 6.0, kernel 2.2.12 with 19990824 raidpatch 
* 1 Adaptec 2940 UW with systemdisks, and mixed linear raided disks 
* 1 Adaptec 2940 U2W with four 9Gb U2W IBM-disks, totally a 27Gb
raid5-set.

yesterday i installed devfs 99.5 (i chose 99.5 cause i wanted the one
specific for 2.2.12) and i had some trouble applying that patch on top of
the raid-patch, but it wasn't too much work, and it finally paid off,
everything worked fine. I used an exact copy of my previous kernel-tree
without devfs when making the new kernel, and didn't change a thing in the
config (except for enabling devfs).

When i checked my logs this morning, all drives of the raid5-set had been
kicked out ([____] in /proc/mdstat), and i traced it to these events in
syslog:

---------------8<----------------
Feb 12 21:23:36 opi kernel: attempt to access beyond end of device
Feb 12 21:23:36 opi kernel: 08:61: rw=0, want=519151696, limit=8956206
Feb 12 21:23:36 opi kernel: dev 09:01 blksize=1024 blocknr=129787923
sector=1038303384 size=4096 count=1
Feb 12 21:23:36 opi kernel: raid5: Disk failure on sd/c1b0t9u0p1,
disabling device. Operation continuing on 3 devices
Feb 12 21:23:36 opi kernel: raid5: restarting stripe 1038303384
Feb 12 21:23:36 opi kernel: attempt to access beyond end of device
Feb 12 21:23:36 opi kernel: 08:51: rw=0, want=519151696, limit=8956206
Feb 12 21:23:36 opi kernel: dev 09:01 blksize=1024 blocknr=129787923
sector=1038303384 size=4096 count=1
Feb 12 21:23:36 opi kernel: raid5: Disk failure on sd/c1b0t8u0p1,
disabling device. Operation continuing on 2 devices
Feb 12 21:23:36 opi kernel: attempt to access beyond end of device
Feb 12 21:23:36 opi kernel: 08:71: rw=0, want=519151696, limit=8956206
Feb 12 21:23:36 opi kernel: dev 09:01 blksize=1024 blocknr=129787923
sector=1038303384 size=4096 count=1
Feb 12 21:23:36 opi kernel: raid5: Disk failure on sd/c1b0t10u0p1,
disabling device. Operation continuing on 1 devices
Feb 12 21:23:36 opi kernel: attempt to access beyond end of device
Feb 12 21:23:36 opi kernel: 08:81: rw=0, want=519151696, limit=8956206
Feb 12 21:23:36 opi kernel: dev 09:01 blksize=1024 blocknr=129787923
sector=1038303384 size=4096 count=1
Feb 12 21:23:36 opi kernel: raid5: Disk failure on sd/c1b0t11u0p1,
disabling device. Operation continuing on 0 devices
Feb 12 21:23:36 opi kernel: raid5: restarting stripe 1038303384
Feb 22 21:23:36 opi kernel: raid5: md1: unrecoverable I/O error for block
926234691
Feb 12 21:23:36 opi kernel: raid5: md1: unrecoverable I/O error for block
4571216
Feb 12 21:23:36 opi kernel: raid5: md1: unrecoverable I/O error for block
4571224
Feb 12 21:23:36 opi kernel: raid5: md1: unrecoverable I/O error for block
4571217
------------------8<---------------------

This last message repeated itself forever until i stopped the raidset.
Before the first "attempt to access..."-event, nothing relevant that could
connect to this was seen in syslog. At the time of the crash, i was
running a rsync from this machine to a backupserver.

Now, i've seen that error before (attempt to access...), but that was in
an earlier kernel-version, and that particular bug was said to be fixed in
2.2.12. When i experienced this bug with the previous kernel version, i
somehow thought this was related to scsi-errors. Could this be the case
here? I don't see a scsi reset in the logs, though.

Is anyone reliably running raid5 with devfs? If so, with what
kernel, devfs, and raid-version? 

Can anybody tell me what's wrong here...? A simple remapping of
devicenames in /dev shouldn't render an "attempt to access beyond
device"...?

Thanks for any help.

//Jocke

Reply via email to