Michael D. Seymour wrote: > Hi all, > > I hope you could help us with some connection problems we are having with our > lustre file system. The filesystem roc consists of 6 OSSs with one OST per > OSS. > Each OSS uses the 1.6.7 RHEL 5 kernel on Centos 5.1 (one unit uses Centos > 5.3). > The MDS uses CentOS 5.1 and Lustre 1.6.7. 203 RHEL-based clients mount the > filesystem and all use Lustre 1.6.7. All are connected via a Gb ethernet > switch > stack. > > One client running CentOS 5.2 re-exports the Lustre filesystem via NFS on a > different network. >
Also got this earlier today before more verbose debug logging was enabled: On client trinity: May 29 10:35:47 trinity kernel: LustreError: 5111:0:(lib-move.c:110:lnet_try_match_md()) Matching packet from 12345-10.5.203....@tcp, match 20177453 length 728 too big: 704 left, 704 allowed May 29 10:40:47 trinity kernel: LustreError: 11-0: an error occurred while communicating with 10.5.203....@tcp. The mds_close operation failed with -116 May 29 10:40:47 trinity kernel: LustreError: 26783:0:(file.c:113:ll_close_inode_openhandle()) inode 37609433 mdc close failed: rc = -116 May 29 10:40:47 trinity kernel: LustreError: 26783:0:(file.c:113:ll_close_inode_openhandle()) Skipped 1 previous similar message On MDS rocpile: May 29 10:35:47 rocpile kernel: LustreError: 10227:0:(mds_open.c:1561:mds_close()) @@@ no handle for file close ino 37609433: cookie 0xa00c7cf9e763396b r...@ffff8101274e3400 x20177453/t0 o35->84adb9a1-8959-fcf5-cc72-81c6a1e17...@net_0x200000a05cc02_uuid:0/0 lens 296/728 e 0 to 0 dl 1243608047 ref 1 fl Interpret:/0/0 rc 0/0 May 29 10:35:47 rocpile kernel: LustreError: 10227:0:(ldlm_lib.c:1619:target_send_reply_msg()) @@@ processing error (-116) r...@ffff8101274e3400 x20177453/t0 o35->84adb9a1-8959-fcf5-cc72-81c6a1e17...@net_0x200000a05cc02_uuid:0/0 lens 296/728 e 0 to 0 dl 1243608047 ref 1 fl Interpret:/0/0 rc -116/0 May 29 10:35:47 rocpile kernel: LustreError: 10227:0:(ldlm_lib.c:1619:target_send_reply_msg()) Skipped 1 previous similar message May 29 10:40:47 rocpile kernel: LustreError: 3611:0:(mds_open.c:1561:mds_close()) @@@ no handle for file close ino 37609433: cookie 0xa00c7cf9e763396b r...@ffff81011f0cda00 x20177453/t0 o35->84adb9a1-8959-fcf5-cc72-81c6a1e17...@net_0x200000a05cc02_uuid:0/0 lens 296/728 e 0 to 0 dl 1243608347 ref 1 fl Interpret:/2/0 rc 0/0 May 29 10:40:47 rocpile kernel: LustreError: 3611:0:(ldlm_lib.c:1619:target_send_reply_msg()) @@@ processing error (-116) r...@ffff81011f0cda00 x20177453/t0 o35->84adb9a1-8959-fcf5-cc72-81c6a1e17...@net_0x200000a05cc02_uuid:0/0 lens 296/728 e 0 to 0 dl 1243608347 ref 1 fl Interpret:/2/0 rc -116/0 I've already extended /proc/sys/lustre/timeout to 300s. Thanks again, Mike -- Michael D. Seymour Phone: 416-978-8497 Scientific Computing Support Fax: 416-978-3921 Canadian Institute for Theoretical Astrophysics, University of Toronto _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
