Jeremy Mann wrote: > I have set up 20 compute nodes as OSTs, one off each other like > compute-0-0 -> 0-1, 0-2 -> 0-3 and so on. However this morning, one of > the drives in a OST failed. The node didn't reboot, it just remounted > its lustre OST device read-only. This caused our normal storage scripts > to fail. > You could mount your devices errors=panic to panic the node instead of remounting RO, thus giving your HA scripts something more useful to work with. > I had to reboot the node anyway to replace the drive, so that's when the > failover to the next node happened. I can see on the Meta server that > Lustre did indeed switch to the failover node, however, the files that > were associated with that node are visible but not readable. Shouldn't > the failover node have prevented this? > The files are visible because the namespace is contained on the MDT, not the individual OSTs. All files will be visible; files on the affected OST will be inaccessible. > The drive that failed is completely dead, I can't even mount it to try a > dd to restore the filesystem, so it looks like I'm going to have to > rebuild the filesystem. > A disk failure is considered an unrecoverable error as far as Lustre is concerned. Your back-end storage must be reliable for Lustre to function -- that's what raid is for. Dual-ported standalone raid boxes allow for failover Lustre servers to take over from each other in case of _node_ failure, not _disk_ failure.
In the meantime, you can deactivate the affected OST using lctl on the clients and MDT; this will allow access functions to complete without errors (the files on the affected OST will be 0-length, but the rest of your files will be ok) _______________________________________________ Lustre-discuss mailing list [email protected] https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
