Hi,

I have made some tests with Lustre 1.6.3 (Kernel 2.6.18-8.1.14.el5_lustre.1.6.3smp) and came across the
following problem: an unzip of a large zip archive on a
lustre filessystem hangs (virtually forever) after about 30000 files
have been extracted.
strace shows that the chmod call on the client does not return.
The problem is reproducible.

The messages file on the client says (several times):
Nov 14 16:54:19 linuxwcc07 kernel: LustreError: 11872:0:(client.c:969:ptlrpc_expire_one_request()) @@@ timeout (sent at 1195055558, 100s ago) [EMAIL PROTECTED] x491921/t0 o36->[EMAIL PROTECTED]@tcp:12 lens 5864/296 ref 1 fl Rpc:/0/0 rc 0/-22 Nov 14 16:54:19 linuxwcc07 kernel: LustreError: 11872:0:(client.c:969:ptlrpc_expire_one_request()) @@@ timeout (sent at 1195055558, 100s ago) [EMAIL PROTECTED] x491921/t0 o36->[EMAIL PROTECTED]@tcp:12 lens 5864/296 ref 1 fl Rpc:/0/0 rc 0/-22 Nov 14 16:54:19 linuxwcc07 kernel: Lustre: lustre-MDT0000-mdc-ffff81021adedc00: Connection to service lustre-MDT0000 via nid [EMAIL PROTECTED] was lost; in progress operations using this service will wait for recovery to complete. Nov 14 16:54:19 linuxwcc07 kernel: Lustre: lustre-MDT0000-mdc-ffff81021adedc00: Connection to service lustre-MDT0000 via nid [EMAIL PROTECTED] was lost; in progress operations using this service will wait for recovery to complete. Nov 14 16:54:19 linuxwcc07 kernel: Lustre: lustre-MDT0000-mdc-ffff81021adedc00: Connection restored to service lustre-MDT0000 using nid [EMAIL PROTECTED]

The corresponding messages on the MDS:
Nov 14 16:52:38 linuxwcc05 kernel: LustreError: 7483:0:(lib-move.c:95:lnet_try_match_md()) Matching packet from [EMAIL PROTECTED], match 491921 length 5864 too big: 7416 left, 5120 allowed Nov 14 16:52:38 linuxwcc05 kernel: LustreError: 7483:0:(lib-move.c:95:lnet_try_match_md()) Matching packet from [EMAIL PROTECTED], match 491921 length 5864 too big: 7416 left, 5120 allowed Nov 14 16:54:19 linuxwcc05 kernel: Lustre: 7606:0:(ldlm_lib.c:514:target_handle_reconnect()) lustre-MDT0000: ec82c01d-f203-81b7-ed36-e0f0cf3b3f32 reconnecting Nov 14 16:54:19 linuxwcc05 kernel: Lustre: 7606:0:(ldlm_lib.c:514:target_handle_reconnect()) lustre-MDT0000: ec82c01d-f203-81b7-ed36-e0f0cf3b3f32 reconnecting

Is this a known issue?

Regards,
Hans Schnitzer

--
Hans-Juergen Schnitzer
RWTH Aachen University, Center for Computing and Communication
Rechen- und Kommunikationszentrum
Seffenter Weg 23, 52074 Aachen (Germany)
Tel.: + 49(0)241/80-28719 - Fax: + 49(0)241/80-628719
[EMAIL PROTECTED]
http://www.rz.rwth-aachen.de

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Reply via email to