On Tue, 2008-11-04 at 09:06 -0800, Kurt Dillen wrote: > > Some more information about the environment: > > - Lustre clients are all vmware virtual systems > - Lustre Farm are all vmware virtual systems
Hrm. That is a bit of a red flag right there. > the errors I see are the following: > > LustreError: 3420:0:(events.c:134:client_bulk_callback()) event type > 0, status -5, desc ffff8100e5dca000 > LustreError: 3420:0:(events.c:134:client_bulk_callback()) event type > 0, status -5, desc ffff8100e519e000 > LustreError: 3420:0:(events.c:134:client_bulk_callback()) event type > 0, status -5, desc ffff8100e4e0a000 > LustreError: 3420:0:(events.c:134:client_bulk_callback()) event type > 0, status -5, desc ffff8100e86b1bc0 > LustreError: 3420:0:(events.c:134:client_bulk_callback()) event type > 0, status -5, desc ffff8100e79fe5c0 > LustreError: 3420:0:(events.c:134:client_bulk_callback()) event type > 0, status -5, desc ffff8100e70a88c0 > LustreError: 3420:0:(events.c:134:client_bulk_callback()) event type > 0, status -5, desc ffff8100e7081280 > LustreError: 3420:0:(events.c:134:client_bulk_callback()) event type > 0, status -5, desc ffff8100e6d6d5c0 > LustreError: 3428:0:(client.c:975:ptlrpc_expire_one_request()) @@@ > timeout (sent at 1225816920, 100s ago) [EMAIL PROTECTED] x17940/t0 > o4->[EMAIL PROTECTED]@tcp:28 lens 384/352 ref 2 fl Rpc:/ > 0/0 rc 0/-22 > Lustre: lustre-OST0005-osc-ffff8100e8551800: Connection to service > lustre-OST0005 via nid [EMAIL PROTECTED] was lost; in progress > operations using this service will wait for recovery to complete. > Lustre: lustre-OST0005-osc-ffff8100e8551800: Connection restored to > service lustre-OST0005 using nid [EMAIL PROTECTED] These are just regular timeouts with nothing really to explain them. A detailed log analysis of all of your server logs (not something we can do here on lustre-discuss) might yield more but I have suspicions about your vmware-farm set up. Running VMs, all competing for the same host resources makes the environment unpredictable. I'm not sure if you are using host-only or bridged networking but my (now quite historic) experience with running lots of vmware machines on a single piece of hardware is that the host-only network is less than robust and the memory rquirements of running many VMs on a single machine are demanding. Additionally, if you have many OSTs all sharing the same physical disk, you will have further contention there. Timeouts are not surprising. I would also encourage you to try 1.6.6 now that it is out. I would also encourage you to get some baseline performance metrics of all of this virtual hardware with our iokit. b.
signature.asc
Description: This is a digitally signed message part
_______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
