I have a test cluster system, using lustre as its rootfs, that I've been using
for several months.  It's generally been pretty trouble-free, at least when I
don't do something dumb to it :-}

Yesterday I was running some test stuff on it, which had nothing in particular
to do with lustre, when for no obvious reason everything on the client
wedged.  I rebooted the client, and it wouldn't come up.  I debugged further,
and discovered that it was no longer able to mount the root at boot time.

I've dug further into it, and it's not at all clear to me what's going on.
I'm not really able to see what debug info might be available to the cluster
client which is trying to use this thing as its rootfs, but when I try to
mount the fs from another random client, the mount just hangs.  I looked in
various logs on both the client and the various servers, and there was nothing
obvious pointing to error conditions.  On the client, it would pop out a
message every 5 seconds of the form

Jan 10 07:58:23 localhost LustreError: 
3674:0:(client.c:950:ptlrpc_expire_one_request()) Skipped 1 previous similar 
message
Jan 10 07:58:23 localhost Lustre: 3674:0:(peer.c:238:lnet_debug_peer()) [EMAIL 
PROTECTED]               2    up     8     8     8     8     7 0

Which suggests to me that it really is something on server 21 which is hung
up.  That server appears to be idling happily, and responds to other
requests.  That server has an OST and the MDT on it, when I tried to unmount
the OST, that also hung, and in its log, I saw a bunch of messages like

Jan 10 08:03:09 localhost LustreError: 
8544:0:(ldlm_lib.c:560:target_handle_connect()) @@@ UUID 'scx1-OST0000_UUID' is 
not available  for connect (stopping)


Usually when I've seen other lustre issues kind of like this, they're
accompanied by lots of commentary in the logs about stuff that its unhappy
about, but this time the appearance is of an otherwise contented machine on
which a piece of lustre is just "stuck".

Anybody seen anything like this?  CFS folks, if I can reproduce this, anything
in particular you'd like me to look for?

TIA...

_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Reply via email to