I rebooted to upgrade to kernel 3.4.1. I accidentally had the combination of uvesafb, nouveau kms and nvidia-drivers enabled, which caused my system to go blank after rebooting. I was not able to SSH into the machine, so I did the magic-sysrq REISUB to reboot into my previous kernel. When it booted into the previous kernel (3.3.5), I saw a whole bunch of "I/O error" messages scrolling by, for every disk in my RAID array. I have never seen these errors before. I hoped it was just some module confusion because I was booting a different kernel. I was able to boot into my root filesystem, but the raid did not assemble. After blacklisting nouveau and rebooting into 3.4.1, there were none of the I/O errors mentioned, but mdraid failed with this message:
* Starting up RAID devices ... * mdadm main: failed to get exclusive lock on mapfile mdadm: /dev/md2 is already in use. mdadm: /dev/md1 is already in use. [ !! ] Oh no! Heart beating quickly... terabytes of data... Google finds nothing useful with these messages. My mdadm.conf has not changed, no physical disks have been added or removed in over a year. mdadm configuration has not changed at all. I have of course updated hundreds of packages since my last reboot, including mdadm. >From the /proc/mdstat it shows that it's not detecting all of the member disks/partitions: Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] [faulty] md1 : inactive sdb1[0](S) 1048575868 blocks super 1.1 md2 : inactive sdf2[5](S) 904938415 blocks super 1.1 unused devices: <none> Those normally included all disks in sdb through sdf, partition 1 and 2 from each disk. My mdadm.conf has always had only two ARRAY lines (for /dev/md1 and /dev/md2) with the UUID of the arrays. Previously the member disks were always automatically detected and assembled when I booted and started mdadm. Running mdadm --query --examine on the partitions showed they did still contain the valid raid information. So I felt confident in trying to reassemble it. To fix, I did: /etc/init.d/mdraid stop to stop the array (could have also done "mdadm -Ss", which is what the stop script did) Then I edited mdadm.conf and added a device line: DEVICE /dev/sd[bcdef][12] So now I am telling it specifically where to look. I then restarted mdraid: /etc/init.d/mdraid start et voilĂ ! my raid was back and functioning. I don't know if this is a result of a change in kernel or mdadm behavior, or simply a result of my REISUB that left the raid in a strange state.