In case some of the folks on this list haven't seen this particular
horror story yet :)

https://csc.fi/web/blog/post/-/blogs/the-largest-unplanned-outage-in-years-and-how-we-survived-it

"The DDN controller replacement went quite smoothly and around 10 a.m.
we were ready to bring the system back online. However, when
restarting the Lustre filesystem, the metadata server reported
anomalies in its filesystem and requested to do a filesystem check
(fsck). Typically these are fairly routine operations, especially when
the filesystem has been up for a long time. Any problems that the
check finds are typically fixed automatically with no impact.

In this case, however, the tool could not fix all the problems it
identified. A faulty inode persisted. Trying to bring the Lustre up
resulted in a system crash (kernel panic) with this inode a very
likely cause."

-Adam
_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to