We've been playing with using luster as root fs for our x86_64 based 
cluster. We've run into quite some stability problems, with arbitrary 
processes on the nodes disappearing, like sshd, gmond, the Myrinet mapper 
or whatever.

We're running 2.6.18-vanilla + lustre 1.6.1, the filesystem being mounted 
read-only. MGS/MDS/OST are all on one server node. I've trouble 
understanding most of the things that lustre is writing to the logs, any 
pointers to additional docs would be appreciated.

One consistently recurring problem is

LustreError: 11169:0:(mdc_locks.c:420:mdc_enqueue()) ldlm_cli_enqueue: -2

on the client.

Last night, in addition clients seemed to be evicted regularly (and 
reconnecting) even though they were up, which may be where the random 
processes died. Currently, we're running with only one client, which seems 
to be stable except for the error above repeating itself.

I'll be happy to provide any additional info needed.

--Kai

_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Reply via email to