On Nov 04, 2008  09:06 -0800, Kurt Dillen wrote:
> We have a serious problem with lustre.  Since a few days we have
> lockups on the client side.  Not all clients are having this
> problem.
> 
> We are running this kernel  2.6.16-54-0.2.5_lustre.1.6.4.3smp.
> 
> The statahead disable is done on the systems.
> 
> Some more information about the environment:
> 
> - Lustre clients are all vmware virtual systems
> - Lustre Farm are all vmware virtual systems
> 
> the errors I see are the following:
> 
> LustreError: 3420:0:(events.c:134:client_bulk_callback()) event type
> 0, status -5, desc ffff8100e5dca000
> LustreError: 3428:0:(client.c:975:ptlrpc_expire_one_request()) @@@
> timeout (sent at 1225816920, 100s ago)  [EMAIL PROTECTED] x17940/t0
> o4->[EMAIL PROTECTED]@tcp:28 lens 384/352 ref 2 fl Rpc:/
> 0/0 rc 0/-22
> Lustre: lustre-OST0005-osc-ffff8100e8551800: Connection to service
> lustre-OST0005 via nid [EMAIL PROTECTED] was lost; in progress
> operations using this service will wait for recovery to complete.

These all look like network problems.  Running production Lustre servers
inside a vmware doesn't make much sense.  We don't test clients inside
vmware, but I don't think that is nearly as bad as running the servers
in a virtual environment.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to