Hans, on the surface this sounds a lot like the following bug we have SUN 
looking in too.  If you have a good 1.6.3 reproducer could you please attach 
it to the bug.  We've been chasing something like this for a while and it has 
been tricky to reproduce.  I'll certainly give your test case a spin and look 
in to this.

https://bugzilla.lustre.org/show_bug.cgi?id=11332

Thanks,
Brian

> Hi,
>
> I have made some tests with Lustre 1.6.3 (Kernel
> 2.6.18-8.1.14.el5_lustre.1.6.3smp) and came across the
> following problem: an unzip of a large zip archive on a
> lustre filessystem hangs (virtually forever) after about 30000 files
> have been extracted.
> strace shows that the chmod call on the client does not return.
> The problem is reproducible.
>
> The messages file on the client says (several times):
> Nov 14 16:54:19 linuxwcc07 kernel: LustreError:
> 11872:0:(client.c:969:ptlrpc_expire_one_request()) @@@ timeout (sent at
> 1195055558, 100s ago)  [EMAIL PROTECTED] x491921/t0
> o36->[EMAIL PROTECTED]@tcp:12 lens 5864/296 ref 1 fl
> Rpc:/0/0 rc 0/-22
> Nov 14 16:54:19 linuxwcc07 kernel: LustreError:
> 11872:0:(client.c:969:ptlrpc_expire_one_request()) @@@ timeout (sent at
> 1195055558, 100s ago)  [EMAIL PROTECTED] x491921/t0
> o36->[EMAIL PROTECTED]@tcp:12 lens 5864/296 ref 1 fl
> Rpc:/0/0 rc 0/-22
> Nov 14 16:54:19 linuxwcc07 kernel: Lustre:
> lustre-MDT0000-mdc-ffff81021adedc00: Connection to service
> lustre-MDT0000 via nid [EMAIL PROTECTED] was lost; in progress
> operations using this service will wait for recovery to complete.
> Nov 14 16:54:19 linuxwcc07 kernel: Lustre:
> lustre-MDT0000-mdc-ffff81021adedc00: Connection to service
> lustre-MDT0000 via nid [EMAIL PROTECTED] was lost; in progress
> operations using this service will wait for recovery to complete.
> Nov 14 16:54:19 linuxwcc07 kernel: Lustre:
> lustre-MDT0000-mdc-ffff81021adedc00: Connection restored to service
> lustre-MDT0000 using nid [EMAIL PROTECTED]
>
> The corresponding messages on the MDS:
> Nov 14 16:52:38 linuxwcc05 kernel: LustreError:
> 7483:0:(lib-move.c:95:lnet_try_match_md()) Matching packet from
> [EMAIL PROTECTED], match 491921 length 5864 too big: 7416 left,
> 5120 allowed
> Nov 14 16:52:38 linuxwcc05 kernel: LustreError:
> 7483:0:(lib-move.c:95:lnet_try_match_md()) Matching packet from
> [EMAIL PROTECTED], match 491921 length 5864 too big: 7416 left,
> 5120 allowed
> Nov 14 16:54:19 linuxwcc05 kernel: Lustre:
> 7606:0:(ldlm_lib.c:514:target_handle_reconnect()) lustre-MDT0000:
> ec82c01d-f203-81b7-ed36-e0f0cf3b3f32 reconnecting
> Nov 14 16:54:19 linuxwcc05 kernel: Lustre:
> 7606:0:(ldlm_lib.c:514:target_handle_reconnect()) lustre-MDT0000:
> ec82c01d-f203-81b7-ed36-e0f0cf3b3f32 reconnecting
>
> Is this a known issue?
>
> Regards,
> Hans Schnitzer

Attachment: pgprmcCWlUmjh.pgp
Description: PGP signature

_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Reply via email to