I have set up 20 compute nodes as OSTs, one off each other like compute-0-0 -> 0-1, 0-2 -> 0-3 and so on. However this morning, one of the drives in a OST failed. The node didn't reboot, it just remounted its lustre OST device read-only. This caused our normal storage scripts to fail.
I had to reboot the node anyway to replace the drive, so that's when the failover to the next node happened. I can see on the Meta server that Lustre did indeed switch to the failover node, however, the files that were associated with that node are visible but not readable. Shouldn't the failover node have prevented this? The drive that failed is completely dead, I can't even mount it to try a dd to restore the filesystem, so it looks like I'm going to have to rebuild the filesystem. -- Jeremy Mann [EMAIL PROTECTED] University of Texas Health Science Center Bioinformatics Core Facility (210) 567-2672 _______________________________________________ Lustre-discuss mailing list [email protected] https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
