Hans, on the surface this sounds a lot like the following bug we have SUN looking in too. If you have a good 1.6.3 reproducer could you please attach it to the bug. We've been chasing something like this for a while and it has been tricky to reproduce. I'll certainly give your test case a spin and look in to this.
https://bugzilla.lustre.org/show_bug.cgi?id=11332 Thanks, Brian > Hi, > > I have made some tests with Lustre 1.6.3 (Kernel > 2.6.18-8.1.14.el5_lustre.1.6.3smp) and came across the > following problem: an unzip of a large zip archive on a > lustre filessystem hangs (virtually forever) after about 30000 files > have been extracted. > strace shows that the chmod call on the client does not return. > The problem is reproducible. > > The messages file on the client says (several times): > Nov 14 16:54:19 linuxwcc07 kernel: LustreError: > 11872:0:(client.c:969:ptlrpc_expire_one_request()) @@@ timeout (sent at > 1195055558, 100s ago) [EMAIL PROTECTED] x491921/t0 > o36->[EMAIL PROTECTED]@tcp:12 lens 5864/296 ref 1 fl > Rpc:/0/0 rc 0/-22 > Nov 14 16:54:19 linuxwcc07 kernel: LustreError: > 11872:0:(client.c:969:ptlrpc_expire_one_request()) @@@ timeout (sent at > 1195055558, 100s ago) [EMAIL PROTECTED] x491921/t0 > o36->[EMAIL PROTECTED]@tcp:12 lens 5864/296 ref 1 fl > Rpc:/0/0 rc 0/-22 > Nov 14 16:54:19 linuxwcc07 kernel: Lustre: > lustre-MDT0000-mdc-ffff81021adedc00: Connection to service > lustre-MDT0000 via nid [EMAIL PROTECTED] was lost; in progress > operations using this service will wait for recovery to complete. > Nov 14 16:54:19 linuxwcc07 kernel: Lustre: > lustre-MDT0000-mdc-ffff81021adedc00: Connection to service > lustre-MDT0000 via nid [EMAIL PROTECTED] was lost; in progress > operations using this service will wait for recovery to complete. > Nov 14 16:54:19 linuxwcc07 kernel: Lustre: > lustre-MDT0000-mdc-ffff81021adedc00: Connection restored to service > lustre-MDT0000 using nid [EMAIL PROTECTED] > > The corresponding messages on the MDS: > Nov 14 16:52:38 linuxwcc05 kernel: LustreError: > 7483:0:(lib-move.c:95:lnet_try_match_md()) Matching packet from > [EMAIL PROTECTED], match 491921 length 5864 too big: 7416 left, > 5120 allowed > Nov 14 16:52:38 linuxwcc05 kernel: LustreError: > 7483:0:(lib-move.c:95:lnet_try_match_md()) Matching packet from > [EMAIL PROTECTED], match 491921 length 5864 too big: 7416 left, > 5120 allowed > Nov 14 16:54:19 linuxwcc05 kernel: Lustre: > 7606:0:(ldlm_lib.c:514:target_handle_reconnect()) lustre-MDT0000: > ec82c01d-f203-81b7-ed36-e0f0cf3b3f32 reconnecting > Nov 14 16:54:19 linuxwcc05 kernel: Lustre: > 7606:0:(ldlm_lib.c:514:target_handle_reconnect()) lustre-MDT0000: > ec82c01d-f203-81b7-ed36-e0f0cf3b3f32 reconnecting > > Is this a known issue? > > Regards, > Hans Schnitzer
pgprmcCWlUmjh.pgp
Description: PGP signature
_______________________________________________ Lustre-discuss mailing list [email protected] https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
