Hello all, We have a serious problem with lustre. Since a few days we have lockups on the client side. Not all clients are having this problem.
We are running this kernel 2.6.16-54-0.2.5_lustre.1.6.4.3smp. The statahead disable is done on the systems. Some more information about the environment: - Lustre clients are all vmware virtual systems - Lustre Farm are all vmware virtual systems the errors I see are the following: LustreError: 3420:0:(events.c:134:client_bulk_callback()) event type 0, status -5, desc ffff8100e5dca000 LustreError: 3420:0:(events.c:134:client_bulk_callback()) event type 0, status -5, desc ffff8100e519e000 LustreError: 3420:0:(events.c:134:client_bulk_callback()) event type 0, status -5, desc ffff8100e4e0a000 LustreError: 3420:0:(events.c:134:client_bulk_callback()) event type 0, status -5, desc ffff8100e86b1bc0 LustreError: 3420:0:(events.c:134:client_bulk_callback()) event type 0, status -5, desc ffff8100e79fe5c0 LustreError: 3420:0:(events.c:134:client_bulk_callback()) event type 0, status -5, desc ffff8100e70a88c0 LustreError: 3420:0:(events.c:134:client_bulk_callback()) event type 0, status -5, desc ffff8100e7081280 LustreError: 3420:0:(events.c:134:client_bulk_callback()) event type 0, status -5, desc ffff8100e6d6d5c0 LustreError: 3428:0:(client.c:975:ptlrpc_expire_one_request()) @@@ timeout (sent at 1225816920, 100s ago) [EMAIL PROTECTED] x17940/t0 o4->[EMAIL PROTECTED]@tcp:28 lens 384/352 ref 2 fl Rpc:/ 0/0 rc 0/-22 Lustre: lustre-OST0005-osc-ffff8100e8551800: Connection to service lustre-OST0005 via nid [EMAIL PROTECTED] was lost; in progress operations using this service will wait for recovery to complete. Lustre: lustre-OST0005-osc-ffff8100e8551800: Connection restored to service lustre-OST0005 using nid [EMAIL PROTECTED] LustreError: 3602:0:(client.c:975:ptlrpc_expire_one_request()) @@@ timeout (sent at 1225816924, 100s ago) [EMAIL PROTECTED] x19702/t0 o36->[EMAIL PROTECTED]@tcp:12 lens 1544/296 ref 1 fl Rpc:/0/0 rc 0/-22 LustreError: 3602:0:(client.c:975:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Lustre: lustre-MDT0000-mdc-ffff8100e8551800: Connection to service lustre-MDT0000 via nid [EMAIL PROTECTED] was lost; in progress operations using this service will wait for recovery to complete. Lustre: lustre-MDT0000-mdc-ffff8100e8551800: Connection restored to service lustre-MDT0000 using nid [EMAIL PROTECTED] LustreError: 3428:0:(client.c:975:ptlrpc_expire_one_request()) @@@ timeout (sent at 1225816953, 100s ago) [EMAIL PROTECTED] x20560/t0 o4->[EMAIL PROTECTED]@tcp:28 lens 384/352 ref 2 fl Rpc:/ 0/0 rc 0/-22 Lustre: lustre-OST0006-osc-ffff8100e8551800: Connection to service lustre-OST0006 via nid [EMAIL PROTECTED] was lost; in progress operations using this service will wait for recovery to complete. Lustre: lustre-OST0006-osc-ffff8100e8551800: Connection restored to service lustre-OST0006 using nid [EMAIL PROTECTED] LustreError: 3602:0:(client.c:975:ptlrpc_expire_one_request()) @@@ timeout (sent at 1225817024, 100s ago) [EMAIL PROTECTED] x19702/t0 o36->[EMAIL PROTECTED]@tcp:12 lens 1544/296 ref 1 fl Rpc:/2/0 rc -11/-22 Lustre: lustre-MDT0000-mdc-ffff8100e8551800: Connection to service lustre-MDT0000 via nid [EMAIL PROTECTED] was lost; in progress operations using this service will wait for recovery to complete. Lustre: lustre-MDT0000-mdc-ffff8100e8551800: Connection restored to service lustre-MDT0000 using nid [EMAIL PROTECTED] LustreError: 3428:0:(client.c:975:ptlrpc_expire_one_request()) @@@ timeout (sent at 1225817053, 100s ago) [EMAIL PROTECTED] x20724/t0 o4->[EMAIL PROTECTED]@tcp:28 lens 384/352 ref 2 fl Rpc:/ 2/0 rc -11/-22 Lustre: lustre-OST0006-osc-ffff8100e8551800: Connection to service lustre-OST0006 via nid [EMAIL PROTECTED] was lost; in progress operations using this service will wait for recovery to complete. Lustre: lustre-OST0006-osc-ffff8100e8551800: Connection restored to service lustre-OST0006 using nid [EMAIL PROTECTED] LustreError: 3602:0:(client.c:975:ptlrpc_expire_one_request()) @@@ timeout (sent at 1225817124, 100s ago) [EMAIL PROTECTED] x19702/t0 o36->[EMAIL PROTECTED]@tcp:12 lens 1544/296 ref 1 fl Rpc:/2/0 rc -11/-22 Lustre: lustre-MDT0000-mdc-ffff8100e8551800: Connection to service lustre-MDT0000 via nid [EMAIL PROTECTED] was lost; in progress operations using this service will wait for recovery to complete. Lustre: lustre-MDT0000-mdc-ffff8100e8551800: Connection restored to service lustre-MDT0000 using nid [EMAIL PROTECTED] LustreError: 3428:0:(client.c:975:ptlrpc_expire_one_request()) @@@ timeout (sent at 1225817153, 100s ago) [EMAIL PROTECTED] x20767/t0 o4->[EMAIL PROTECTED]@tcp:28 lens 384/352 ref 2 fl Rpc:/ 2/0 rc -11/-22 Lustre: lustre-OST0006-osc-ffff8100e8551800: Connection to service lustre-OST0006 via nid [EMAIL PROTECTED] was lost; in progress operations using this service will wait for recovery to complete. Lustre: lustre-OST0006-osc-ffff8100e8551800: Connection restored to service lustre-OST0006 using nid [EMAIL PROTECTED] LustreError: 3602:0:(client.c:975:ptlrpc_expire_one_request()) @@@ timeout (sent at 1225817224, 100s ago) [EMAIL PROTECTED] x19702/t0 o36->[EMAIL PROTECTED]@tcp:12 lens 1544/296 ref 1 fl Rpc:/2/0 rc -11/-22 Lustre: lustre-MDT0000-mdc-ffff8100e8551800: Connection to service lustre-MDT0000 via nid [EMAIL PROTECTED] was lost; in progress operations using this service will wait for recovery to complete. Lustre: lustre-MDT0000-mdc-ffff8100e8551800: Connection restored to service lustre-MDT0000 using nid [EMAIL PROTECTED] LustreError: 3602:0:(client.c:975:ptlrpc_expire_one_request()) @@@ timeout (sent at 1225817324, 100s ago) [EMAIL PROTECTED] x19702/t0 o36->[EMAIL PROTECTED]@tcp:12 lens 1544/296 ref 1 fl Rpc:/2/0 rc -11/-22 LustreError: 3602:0:(client.c:975:ptlrpc_expire_one_request()) Skipped 1 previous similar message Lustre: lustre-MDT0000-mdc-ffff8100e8551800: Connection to service lustre-MDT0000 via nid [EMAIL PROTECTED] was lost; in progress operations using this service will wait for recovery to complete. Lustre: Skipped 1 previous similar message Lustre: lustre-MDT0000-mdc-ffff8100e8551800: Connection restored to service lustre-MDT0000 using nid [EMAIL PROTECTED] Lustre: Skipped 1 previous similar message LustreError: 3602:0:(client.c:975:ptlrpc_expire_one_request()) @@@ timeout (sent at 1225817424, 100s ago) [EMAIL PROTECTED] x19702/t0 o36->[EMAIL PROTECTED]@tcp:12 lens 1544/296 ref 1 fl Rpc:/2/0 rc -11/-22 LustreError: 3602:0:(client.c:975:ptlrpc_expire_one_request()) Skipped 1 previous similar message Lustre: lustre-MDT0000-mdc-ffff8100e8551800: Connection to service lustre-MDT0000 via nid [EMAIL PROTECTED] was lost; in progress operations using this service will wait for recovery to complete. Lustre: Skipped 1 previous similar message Lustre: lustre-MDT0000-mdc-ffff8100e8551800: Connection restored to service lustre-MDT0000 using nid [EMAIL PROTECTED] Lustre: Skipped 1 previous similar message LustreError: 3428:0:(client.c:975:ptlrpc_expire_one_request()) @@@ timeout (sent at 1225817553, 100s ago) [EMAIL PROTECTED] x20952/t0 o4->[EMAIL PROTECTED]@tcp:28 lens 384/352 ref 2 fl Rpc:/ 2/0 rc -11/-22 LustreError: 3428:0:(client.c:975:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Lustre: lustre-OST0006-osc-ffff8100e8551800: Connection to service lustre-OST0006 via nid [EMAIL PROTECTED] was lost; in progress operations using this service will wait for recovery to complete. Lustre: Skipped 2 previous similar messages Lustre: lustre-OST0006-osc-ffff8100e8551800: Connection restored to service lustre-OST0006 using nid [EMAIL PROTECTED] Lustre: Skipped 2 previous similar messages LustreError: 3420:0:(events.c:134:client_bulk_callback()) event type 0, status -5, desc ffff8100efba6800 LustreError: 3602:0:(client.c:975:ptlrpc_expire_one_request()) @@@ timeout (sent at 1225817824, 99s ago) [EMAIL PROTECTED] x19702/t0 o36->[EMAIL PROTECTED]@tcp:12 lens 1544/296 ref 1 fl Rpc:/2/0 rc -11/-22 LustreError: 3602:0:(client.c:975:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Lustre: lustre-MDT0000-mdc-ffff8100e8551800: Connection to service lustre-MDT0000 via nid [EMAIL PROTECTED] was lost; in progress operations using this service will wait for recovery to complete. Lustre: Skipped 4 previous similar messages Lustre: lustre-MDT0000-mdc-ffff8100e8551800: Connection restored to service lustre-MDT0000 using nid [EMAIL PROTECTED] Lustre: Skipped 4 previous similar messages Could somebody help me out ? Thanks in advance. Kurt _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
