In case some of the folks on this list haven't seen this particular horror story yet :)
https://csc.fi/web/blog/post/-/blogs/the-largest-unplanned-outage-in-years-and-how-we-survived-it "The DDN controller replacement went quite smoothly and around 10 a.m. we were ready to bring the system back online. However, when restarting the Lustre filesystem, the metadata server reported anomalies in its filesystem and requested to do a filesystem check (fsck). Typically these are fairly routine operations, especially when the filesystem has been up for a long time. Any problems that the check finds are typically fixed automatically with no impact. In this case, however, the tool could not fix all the problems it identified. A faulty inode persisted. Trying to bring the Lustre up resulted in a system crash (kernel panic) with this inode a very likely cause." -Adam _______________________________________________ Beowulf mailing list, [email protected] sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
