Hi all, our MDT gets stuck and unresponsive with very high loads (Lustre 1.6.7.1, Kernel 2.6.22, 8 Core, 32GB RAM). The only thing calling attention is one ll_mt_?? process running with 100% cpu. Nothing unusual happening on the cluster before that. After reboot as well as after moving the service to another server, this behavior reappears. The initial stages - mounting MGS, mouting MDT, recovery - work fine, but then the load goes up and the system is rendered unusable.
Atm, I don't know what to do, except shutting down all servers and possible do a writeconf everywhere. I see that a similar problem was reported by Mag in March this year, but no clues or solutions appeared. Any ideas? Yours, Thomas _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
