I have a test cluster system, using lustre as its rootfs, that I've been using for several months. It's generally been pretty trouble-free, at least when I don't do something dumb to it :-}
Yesterday I was running some test stuff on it, which had nothing in particular to do with lustre, when for no obvious reason everything on the client wedged. I rebooted the client, and it wouldn't come up. I debugged further, and discovered that it was no longer able to mount the root at boot time. I've dug further into it, and it's not at all clear to me what's going on. I'm not really able to see what debug info might be available to the cluster client which is trying to use this thing as its rootfs, but when I try to mount the fs from another random client, the mount just hangs. I looked in various logs on both the client and the various servers, and there was nothing obvious pointing to error conditions. On the client, it would pop out a message every 5 seconds of the form Jan 10 07:58:23 localhost LustreError: 3674:0:(client.c:950:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jan 10 07:58:23 localhost Lustre: 3674:0:(peer.c:238:lnet_debug_peer()) [EMAIL PROTECTED] 2 up 8 8 8 8 7 0 Which suggests to me that it really is something on server 21 which is hung up. That server appears to be idling happily, and responds to other requests. That server has an OST and the MDT on it, when I tried to unmount the OST, that also hung, and in its log, I saw a bunch of messages like Jan 10 08:03:09 localhost LustreError: 8544:0:(ldlm_lib.c:560:target_handle_connect()) @@@ UUID 'scx1-OST0000_UUID' is not available for connect (stopping) Usually when I've seen other lustre issues kind of like this, they're accompanied by lots of commentary in the logs about stuff that its unhappy about, but this time the appearance is of an otherwise contented machine on which a piece of lustre is just "stuck". Anybody seen anything like this? CFS folks, if I can reproduce this, anything in particular you'd like me to look for? TIA... _______________________________________________ Lustre-discuss mailing list [email protected] https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
