Greetings,

I have a new, still very small Lustre 1.8 cluster... only currently 3
clients and 1 OST. All servers are CentOS 5.3. The MDS and OST are
both Lustre 1.8.0. One of the clients is 1.8.0 running the Lustre
kernel from Sun, the other two are Lustre 1.8.0.1 running the stock
RHEL/CentOS kernel newly supported by the 1.8.0.1 precompiled
patchless kernel module RPMs, 2.6.18-128.1.6.el5.

Under a fairly weak load, we're starting to see lots and lots of the
following errors, and extremely poor performance:

on the MDS:
kernel: Lustre: 16140:0:(ldlm_lib.c:815:target_handle_connect())
storage-MDT0000: refuse reconnection from
[email protected]@tcp to
0xffff8100bd4aa000; still busy with 2 active RPCs

LustreError: 138-a: storage-MDT0000: A client on nid x.x....@tcp was
evicted due to a lock blocking callback to x.x....@tcp timed out: rc
-107


on the OST:

kernel: Lustre: 25764:0:(ldlm_lib.c:815:target_handle_connect())
storage-OST0000: refuse reconnection from
[email protected]@tcp to
0xffff81002139e000; still busy with 10 active RPCs

kernel: Lustre: 25804:0:(ldlm_lib.c:815:target_handle_connect())
storage-OST0000: refuse reconnection from
[email protected]@tcp to
0xffff810105a72000; still busy with 12 active RPCs

LustreError: 138-a: storage-OST0000: A client on nid x.x....@tcp was
evicted due to a lock glimpse callback to x.x....@tcp timed out: rc
-107



Another common problem is clients not reconnecting during recovery.
Instead, it often seems to just sit and wait for the full 5 minutes,
even with this tiny number of clients (4 total... the MDS sometimes
mounts the data store itself).

Any ideas on what we can tune to get this back on track?

Many thanks,
Jake
_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to