We've been playing with using luster as root fs for our x86_64 based cluster. We've run into quite some stability problems, with arbitrary processes on the nodes disappearing, like sshd, gmond, the Myrinet mapper or whatever.
We're running 2.6.18-vanilla + lustre 1.6.1, the filesystem being mounted read-only. MGS/MDS/OST are all on one server node. I've trouble understanding most of the things that lustre is writing to the logs, any pointers to additional docs would be appreciated. One consistently recurring problem is LustreError: 11169:0:(mdc_locks.c:420:mdc_enqueue()) ldlm_cli_enqueue: -2 on the client. Last night, in addition clients seemed to be evicted regularly (and reconnecting) even though they were up, which may be where the random processes died. Currently, we're running with only one client, which seems to be stable except for the error above repeating itself. I'll be happy to provide any additional info needed. --Kai _______________________________________________ Lustre-discuss mailing list [email protected] https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
