On Nov 04, 2008 09:06 -0800, Kurt Dillen wrote: > We have a serious problem with lustre. Since a few days we have > lockups on the client side. Not all clients are having this > problem. > > We are running this kernel 2.6.16-54-0.2.5_lustre.1.6.4.3smp. > > The statahead disable is done on the systems. > > Some more information about the environment: > > - Lustre clients are all vmware virtual systems > - Lustre Farm are all vmware virtual systems > > the errors I see are the following: > > LustreError: 3420:0:(events.c:134:client_bulk_callback()) event type > 0, status -5, desc ffff8100e5dca000 > LustreError: 3428:0:(client.c:975:ptlrpc_expire_one_request()) @@@ > timeout (sent at 1225816920, 100s ago) [EMAIL PROTECTED] x17940/t0 > o4->[EMAIL PROTECTED]@tcp:28 lens 384/352 ref 2 fl Rpc:/ > 0/0 rc 0/-22 > Lustre: lustre-OST0005-osc-ffff8100e8551800: Connection to service > lustre-OST0005 via nid [EMAIL PROTECTED] was lost; in progress > operations using this service will wait for recovery to complete.
These all look like network problems. Running production Lustre servers inside a vmware doesn't make much sense. We don't test clients inside vmware, but I don't think that is nearly as bad as running the servers in a virtual environment. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
