Hi guys,
So recently I had a hard drive go down with some unusual behaviour I
thought I'd report. Since this is a production machine and I can't
really replicate it, I've tired to be as detailed about the situation as
I can.
In a nutshell, after some odd errors (which I think originated in the
SATA code) the array degraded. I let it rebuild to a hot spare, but upon
reboot it started rebuilding again even though the spare checked out as
ok. I let rebuild again, but upon reboot it rebuilt again. This
behaviour occured spanned 2.6.12-ck3s and 2.6.15.1 (after the first
weird rebuild, I did a kernel upgrade thinking the bug may have been
fixed). I ended up having to swap the hot-spare to the old drives
position on the SATA controller (i.e. put sdo where sdj used to be).
Everything was groovy from then on out.
This array is normally dormant (very light duty). I was doing some heavy
io when the errors came up, but I think this may have been the result of
a bug in the SATA stack, because after the kernel upgrade the drives
were a *LOT* quieter. A kernel upgrade shouldn't really do that.
In any case, I have included the relevant dmesgs and a --examine and
--detail for all drives as soon as the second (2.6.15.1) weird rebuild
started. If you need any more info I'll do my best to provide it, but I
thought I should at least report this.
Neil
P.S. I realise in the files below, md3 is also degraded. sdg died after
the first rebuild but before the reboot, due to me running my program
again (md5s of all files on the array) and subsequently tripping the
alleged SATA bug.
---
dmesg below
--detail and --examine files attached along with an unhappy mdstat
Random Info:
CPU - AMD Athlon(TM) XP 2500+
Memory: 512M
SATA cards: 3 x SATA 3114 (md3 and md4)
Drives: 6 x Maxtor 6Y200M0 (md3) and 6 x Maxtor 7L300S0 (md4)
In this dmesg, sdo should *not* have been booted out of the array.
md: autorun ...
md: considering sdo1 ...
md: adding sdo1 ...
md: adding sdn1 ...
md: adding sdm1 ...
md: adding sdl1 ...
md: adding sdk1 ...
md: adding sdj1 ...
md: adding sdi1 ...
md: sdh1 has different UUID to sdo1
md: sdg1 has different UUID to sdo1
md: sdf1 has different UUID to sdo1
md: sde1 has different UUID to sdo1
md: sdd1 has different UUID to sdo1
md: sdc1 has different UUID to sdo1
md: sdb3 has different UUID to sdo1
md: sdb2 has different UUID to sdo1
md: sdb1 has different UUID to sdo1
md: sda3 has different UUID to sdo1
md: sda2 has different UUID to sdo1
md: sda1 has different UUID to sdo1
devfs_mk_dev: could not append to parent for md/4
md: created md4
md: bind<sdi1>
md: bind<sdj1>
md: bind<sdk1>
md: bind<sdl1>
md: bind<sdm1>
md: bind<sdn1>
md: export_rdev(sdo1)
md: running: <sdn1><sdm1><sdl1><sdk1><sdj1><sdi1>
md: kicking non-fresh sdj1 from array!
md: unbind<sdj1>
md: export_rdev(sdj1)
raid5: device sdn1 operational as raid disk 5
raid5: device sdm1 operational as raid disk 0
raid5: device sdl1 operational as raid disk 1
raid5: device sdk1 operational as raid disk 2
raid5: device sdi1 operational as raid disk 4
raid5: allocated 6290kB for md4
raid5: raid level 5 set md4 active with 5 out of 6 devices, algorithm 2
--- rd:6 wd:5 fd:1
disk 0, o:1, dev:sdm1
disk 1, o:1, dev:sdl1
disk 2, o:1, dev:sdk1
disk 4, o:1, dev:sdi1
disk 5, o:1, dev:sdn1
/dev/md0:
Version : 00.90.01
Creation Time : Sun Jul 3 01:07:32 2005
Raid Level : raid1
Array Size : 4883648 (4.66 GiB 5.00 GB)
Device Size : 4883648 (4.66 GiB 5.00 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Tue Jan 24 12:45:59 2006
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
UUID : d7e4fbc0:fb23ed40:4cb15b40:45319463
Events : 0.1120516
Number Major Minor RaidDevice State
0 8 1 0 active sync /dev/sda1
1 8 17 1 active sync /dev/sdb1
/dev/md1:
Version : 00.90.01
Creation Time : Sun Jul 3 01:07:39 2005
Raid Level : raid1
Array Size : 1951808 (1.86 GiB 2.00 GB)
Device Size : 1951808 (1.86 GiB 2.00 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 1
Persistence : Superblock is persistent
Update Time : Tue Jan 24 13:38:45 2006
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
UUID : e2e9d49f:a777210d:015eeea0:c259ad00
Events : 0.855
Number Major Minor RaidDevice State
0 8 2 0 active sync /dev/sda2
1 8 18 1 active sync /dev/sdb2
/dev/md2:
Version : 00.90.01
Creation Time : Sun Jul 3 01:07:46 2005
Raid Level : raid1
Array Size : 73200064 (69.81 GiB 74.96 GB)
Device Size : 73200064 (69.81 GiB 74.96 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 2
Persistence : Superblock is persistent
Update Time : Tue Jan 24 12:45:59 2006
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
UUID : 51ad8be1:d645a8b9:4e62dc45:468e23f5
Events : 0.8172300
Number Major Minor RaidDevice State
0 8 3 0 active sync /dev/sda3
1 8 19 1 active sync /dev/sdb3
/dev/md3:
Version : 00.90.01
Creation Time : Sat Apr 24 13:28:57 2004
Raid Level : raid5
Array Size : 995708160 (949.58 GiB 1019.61 GB)
Device Size : 199141632 (189.92 GiB 203.92 GB)
Raid Devices : 6
Total Devices : 5
Preferred Minor : 3
Persistence : Superblock is persistent
Update Time : Tue Jan 24 13:35:21 2006
State : clean, degraded
Active Devices : 5
Working Devices : 5
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 256K
UUID : 090b8145:76a7193c:060bb39c:1e874db7
Events : 0.40600763
Number Major Minor RaidDevice State
0 8 33 0 active sync /dev/sdc1
1 8 81 1 active sync /dev/sdf1
2 0 0 - removed
3 8 65 3 active sync /dev/sde1
4 8 113 4 active sync /dev/sdh1
5 8 49 5 active sync /dev/sdd1
/dev/md4:
Version : 00.90.01
Creation Time : Mon Jul 4 19:43:00 2005
Raid Level : raid5
Array Size : 1465248000 (1397.37 GiB 1500.41 GB)
Device Size : 293049600 (279.47 GiB 300.08 GB)
Raid Devices : 6
Total Devices : 5
Preferred Minor : 4
Persistence : Superblock is persistent
Update Time : Tue Jan 24 13:35:21 2006
State : clean, degraded
Active Devices : 5
Working Devices : 5
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 256K
UUID : 55c1c152:76de5d6a:5d5399ef:2fa00728
Events : 0.3230512
Number Major Minor RaidDevice State
0 8 193 0 active sync /dev/sdm1
1 8 177 1 active sync /dev/sdl1
2 8 161 2 active sync /dev/sdk1
3 0 0 - removed
4 8 129 4 active sync /dev/sdi1
5 8 209 5 active sync /dev/sdn1
/dev/sda1:
Magic : a92b4efc
Version : 00.90.00
UUID : d7e4fbc0:fb23ed40:4cb15b40:45319463
Creation Time : Sun Jul 3 01:07:32 2005
Raid Level : raid1
Raid Devices : 2
Total Devices : 2
Preferred Minor : 0
Update Time : Tue Jan 24 12:45:44 2006
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Checksum : 95203db1 - correct
Events : 0.1120512
Number Major Minor RaidDevice State
this 0 8 1 0 active sync /dev/sda1
0 0 8 1 0 active sync /dev/sda1
1 1 8 17 1 active sync /dev/sdb1
/dev/sdb1:
Magic : a92b4efc
Version : 00.90.00
UUID : d7e4fbc0:fb23ed40:4cb15b40:45319463
Creation Time : Sun Jul 3 01:07:32 2005
Raid Level : raid1
Raid Devices : 2
Total Devices : 2
Preferred Minor : 0
Update Time : Tue Jan 24 12:45:44 2006
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Checksum : 95203dc3 - correct
Events : 0.1120512
Number Major Minor RaidDevice State
this 1 8 17 1 active sync /dev/sdb1
0 0 8 1 0 active sync /dev/sda1
1 1 8 17 1 active sync /dev/sdb1
/dev/sdc1:
Magic : a92b4efc
Version : 00.90.00
UUID : 090b8145:76a7193c:060bb39c:1e874db7
Creation Time : Sat Apr 24 13:28:57 2004
Raid Level : raid5
Raid Devices : 6
Total Devices : 6
Preferred Minor : 3
Update Time : Tue Jan 24 13:35:21 2006
State : clean
Active Devices : 5
Working Devices : 5
Failed Devices : 2
Spare Devices : 0
Checksum : e28a3bbd - correct
Events : 0.40600763
Layout : left-symmetric
Chunk Size : 256K
Number Major Minor RaidDevice State
this 0 8 33 0 active sync /dev/sdc1
0 0 8 33 0 active sync /dev/sdc1
1 1 8 81 1 active sync /dev/sdf1
2 2 0 0 2 faulty removed
3 3 8 65 3 active sync /dev/sde1
4 4 8 113 4 active sync /dev/sdh1
5 5 8 49 5 active sync /dev/sdd1
/dev/sdd1:
Magic : a92b4efc
Version : 00.90.00
UUID : 090b8145:76a7193c:060bb39c:1e874db7
Creation Time : Sat Apr 24 13:28:57 2004
Raid Level : raid5
Raid Devices : 6
Total Devices : 6
Preferred Minor : 3
Update Time : Tue Jan 24 13:35:21 2006
State : clean
Active Devices : 5
Working Devices : 5
Failed Devices : 2
Spare Devices : 0
Checksum : e28a3bd7 - correct
Events : 0.40600763
Layout : left-symmetric
Chunk Size : 256K
Number Major Minor RaidDevice State
this 5 8 49 5 active sync /dev/sdd1
0 0 8 33 0 active sync /dev/sdc1
1 1 8 81 1 active sync /dev/sdf1
2 2 0 0 2 faulty removed
3 3 8 65 3 active sync /dev/sde1
4 4 8 113 4 active sync /dev/sdh1
5 5 8 49 5 active sync /dev/sdd1
/dev/sde1:
Magic : a92b4efc
Version : 00.90.00
UUID : 090b8145:76a7193c:060bb39c:1e874db7
Creation Time : Sat Apr 24 13:28:57 2004
Raid Level : raid5
Raid Devices : 6
Total Devices : 6
Preferred Minor : 3
Update Time : Tue Jan 24 13:35:21 2006
State : clean
Active Devices : 5
Working Devices : 5
Failed Devices : 2
Spare Devices : 0
Checksum : e28a3be3 - correct
Events : 0.40600763
Layout : left-symmetric
Chunk Size : 256K
Number Major Minor RaidDevice State
this 3 8 65 3 active sync /dev/sde1
0 0 8 33 0 active sync /dev/sdc1
1 1 8 81 1 active sync /dev/sdf1
2 2 0 0 2 faulty removed
3 3 8 65 3 active sync /dev/sde1
4 4 8 113 4 active sync /dev/sdh1
5 5 8 49 5 active sync /dev/sdd1
/dev/sdf1:
Magic : a92b4efc
Version : 00.90.00
UUID : 090b8145:76a7193c:060bb39c:1e874db7
Creation Time : Sat Apr 24 13:28:57 2004
Raid Level : raid5
Raid Devices : 6
Total Devices : 6
Preferred Minor : 3
Update Time : Tue Jan 24 13:35:21 2006
State : clean
Active Devices : 5
Working Devices : 5
Failed Devices : 2
Spare Devices : 0
Checksum : e28a3bef - correct
Events : 0.40600763
Layout : left-symmetric
Chunk Size : 256K
Number Major Minor RaidDevice State
this 1 8 81 1 active sync /dev/sdf1
0 0 8 33 0 active sync /dev/sdc1
1 1 8 81 1 active sync /dev/sdf1
2 2 0 0 2 faulty removed
3 3 8 65 3 active sync /dev/sde1
4 4 8 113 4 active sync /dev/sdh1
5 5 8 49 5 active sync /dev/sdd1
/dev/sdg1:
Magic : a92b4efc
Version : 00.90.00
UUID : 090b8145:76a7193c:060bb39c:1e874db7
Creation Time : Sat Apr 24 13:28:57 2004
Raid Level : raid5
Raid Devices : 6
Total Devices : 6
Preferred Minor : 3
Update Time : Tue Jan 24 12:45:45 2006
State : active
Active Devices : 6
Working Devices : 6
Failed Devices : 0
Spare Devices : 0
Checksum : e01ea8cf - correct
Events : 0.40600053
Layout : left-symmetric
Chunk Size : 256K
Number Major Minor RaidDevice State
this 2 8 97 2 active sync /dev/sdg1
0 0 8 33 0 active sync /dev/sdc1
1 1 8 81 1 active sync /dev/sdf1
2 2 8 97 2 active sync /dev/sdg1
3 3 8 65 3 active sync /dev/sde1
4 4 8 113 4 active sync /dev/sdh1
5 5 8 49 5 active sync /dev/sdd1
/dev/sdh1:
Magic : a92b4efc
Version : 00.90.00
UUID : 090b8145:76a7193c:060bb39c:1e874db7
Creation Time : Sat Apr 24 13:28:57 2004
Raid Level : raid5
Raid Devices : 6
Total Devices : 6
Preferred Minor : 3
Update Time : Tue Jan 24 13:35:21 2006
State : clean
Active Devices : 5
Working Devices : 5
Failed Devices : 2
Spare Devices : 0
Checksum : e28a3c15 - correct
Events : 0.40600763
Layout : left-symmetric
Chunk Size : 256K
Number Major Minor RaidDevice State
this 4 8 113 4 active sync /dev/sdh1
0 0 8 33 0 active sync /dev/sdc1
1 1 8 81 1 active sync /dev/sdf1
2 2 0 0 2 faulty removed
3 3 8 65 3 active sync /dev/sde1
4 4 8 113 4 active sync /dev/sdh1
5 5 8 49 5 active sync /dev/sdd1
/dev/sdi1:
Magic : a92b4efc
Version : 00.90.00
UUID : 55c1c152:76de5d6a:5d5399ef:2fa00728
Creation Time : Mon Jul 4 19:43:00 2005
Raid Level : raid5
Raid Devices : 6
Total Devices : 7
Preferred Minor : 4
Update Time : Tue Jan 24 13:35:21 2006
State : clean
Active Devices : 6
Working Devices : 6
Failed Devices : 1
Spare Devices : 0
Checksum : 9b3c01e3 - correct
Events : 0.3230512
Layout : left-symmetric
Chunk Size : 256K
Number Major Minor RaidDevice State
this 4 8 129 4 active sync /dev/sdi1
0 0 8 193 0 active sync /dev/sdm1
1 1 8 177 1 active sync /dev/sdl1
2 2 8 161 2 active sync /dev/sdk1
3 3 8 225 3 active sync /dev/sdo1
4 4 8 129 4 active sync /dev/sdi1
5 5 8 209 5 active sync /dev/sdn1
/dev/sdj1:
Magic : a92b4efc
Version : 00.90.00
UUID : 55c1c152:76de5d6a:5d5399ef:2fa00728
Creation Time : Mon Jul 4 19:43:00 2005
Raid Level : raid5
Raid Devices : 6
Total Devices : 7
Preferred Minor : 4
Update Time : Tue Jan 24 02:13:42 2006
State : active
Active Devices : 6
Working Devices : 7
Failed Devices : 0
Spare Devices : 1
Checksum : 9b0a08b2 - correct
Events : 0.3226853
Layout : left-symmetric
Chunk Size : 256K
Number Major Minor RaidDevice State
this 3 8 145 3 active sync /dev/sdj1
0 0 8 193 0 active sync /dev/sdm1
1 1 8 177 1 active sync /dev/sdl1
2 2 8 161 2 active sync /dev/sdk1
3 3 8 145 3 active sync /dev/sdj1
4 4 8 129 4 active sync /dev/sdi1
5 5 8 209 5 active sync /dev/sdn1
6 6 8 225 6 spare /dev/sdo1
/dev/sdk1:
Magic : a92b4efc
Version : 00.90.00
UUID : 55c1c152:76de5d6a:5d5399ef:2fa00728
Creation Time : Mon Jul 4 19:43:00 2005
Raid Level : raid5
Raid Devices : 6
Total Devices : 7
Preferred Minor : 4
Update Time : Tue Jan 24 13:35:21 2006
State : clean
Active Devices : 6
Working Devices : 6
Failed Devices : 1
Spare Devices : 0
Checksum : 9b3c01ff - correct
Events : 0.3230512
Layout : left-symmetric
Chunk Size : 256K
Number Major Minor RaidDevice State
this 2 8 161 2 active sync /dev/sdk1
0 0 8 193 0 active sync /dev/sdm1
1 1 8 177 1 active sync /dev/sdl1
2 2 8 161 2 active sync /dev/sdk1
3 3 8 225 3 active sync /dev/sdo1
4 4 8 129 4 active sync /dev/sdi1
5 5 8 209 5 active sync /dev/sdn1
/dev/sdl1:
Magic : a92b4efc
Version : 00.90.00
UUID : 55c1c152:76de5d6a:5d5399ef:2fa00728
Creation Time : Mon Jul 4 19:43:00 2005
Raid Level : raid5
Raid Devices : 6
Total Devices : 7
Preferred Minor : 4
Update Time : Tue Jan 24 13:35:21 2006
State : clean
Active Devices : 6
Working Devices : 6
Failed Devices : 1
Spare Devices : 0
Checksum : 9b3c020d - correct
Events : 0.3230512
Layout : left-symmetric
Chunk Size : 256K
Number Major Minor RaidDevice State
this 1 8 177 1 active sync /dev/sdl1
0 0 8 193 0 active sync /dev/sdm1
1 1 8 177 1 active sync /dev/sdl1
2 2 8 161 2 active sync /dev/sdk1
3 3 8 225 3 active sync /dev/sdo1
4 4 8 129 4 active sync /dev/sdi1
5 5 8 209 5 active sync /dev/sdn1
/dev/sdm1:
Magic : a92b4efc
Version : 00.90.00
UUID : 55c1c152:76de5d6a:5d5399ef:2fa00728
Creation Time : Mon Jul 4 19:43:00 2005
Raid Level : raid5
Raid Devices : 6
Total Devices : 7
Preferred Minor : 4
Update Time : Tue Jan 24 13:35:21 2006
State : clean
Active Devices : 6
Working Devices : 6
Failed Devices : 1
Spare Devices : 0
Checksum : 9b3c021b - correct
Events : 0.3230512
Layout : left-symmetric
Chunk Size : 256K
Number Major Minor RaidDevice State
this 0 8 193 0 active sync /dev/sdm1
0 0 8 193 0 active sync /dev/sdm1
1 1 8 177 1 active sync /dev/sdl1
2 2 8 161 2 active sync /dev/sdk1
3 3 8 225 3 active sync /dev/sdo1
4 4 8 129 4 active sync /dev/sdi1
5 5 8 209 5 active sync /dev/sdn1
/dev/sdn1:
Magic : a92b4efc
Version : 00.90.00
UUID : 55c1c152:76de5d6a:5d5399ef:2fa00728
Creation Time : Mon Jul 4 19:43:00 2005
Raid Level : raid5
Raid Devices : 6
Total Devices : 7
Preferred Minor : 4
Update Time : Tue Jan 24 13:35:21 2006
State : clean
Active Devices : 6
Working Devices : 6
Failed Devices : 1
Spare Devices : 0
Checksum : 9b3c0235 - correct
Events : 0.3230512
Layout : left-symmetric
Chunk Size : 256K
Number Major Minor RaidDevice State
this 5 8 209 5 active sync /dev/sdn1
0 0 8 193 0 active sync /dev/sdm1
1 1 8 177 1 active sync /dev/sdl1
2 2 8 161 2 active sync /dev/sdk1
3 3 8 225 3 active sync /dev/sdo1
4 4 8 129 4 active sync /dev/sdi1
5 5 8 209 5 active sync /dev/sdn1
/dev/sdo1:
Magic : a92b4efc
Version : 00.90.00
UUID : 55c1c152:76de5d6a:5d5399ef:2fa00728
Creation Time : Mon Jul 4 19:43:00 2005
Raid Level : raid5
Raid Devices : 6
Total Devices : 7
Preferred Minor : 4
Update Time : Tue Jan 24 13:35:21 2006
State : clean
Active Devices : 6
Working Devices : 6
Failed Devices : 1
Spare Devices : 0
Checksum : 9b3c0241 - correct
Events : 0.3230512
Layout : left-symmetric
Chunk Size : 256K
Number Major Minor RaidDevice State
this 3 8 225 3 active sync /dev/sdo1
0 0 8 193 0 active sync /dev/sdm1
1 1 8 177 1 active sync /dev/sdl1
2 2 8 161 2 active sync /dev/sdk1
3 3 8 225 3 active sync /dev/sdo1
4 4 8 129 4 active sync /dev/sdi1
5 5 8 209 5 active sync /dev/sdn1
Personalities : [raid1] [raid5] [raid6]
md1 : active raid1 sdb2[1] sda2[0]
1951808 blocks [2/2] [UU]
md2 : active raid1 sdb3[1] sda3[0]
73200064 blocks [2/2] [UU]
md3 : active (read-only) raid5 sdh1[4] sdf1[1] sde1[3] sdd1[5] sdc1[0]
995708160 blocks level 5, 256k chunk, algorithm 2 [6/5] [UU_UUU]
md4 : active (read-only) raid5 sdn1[5] sdm1[0] sdl1[1] sdk1[2] sdi1[4]
1465248000 blocks level 5, 256k chunk, algorithm 2 [6/5] [UUU_UU]
md0 : active raid1 sdb1[1] sda1[0]
4883648 blocks [2/2] [UU]
unused devices: <none>