I'm attempting my first-ever Lustre install on a small test cluster. I have one MDS and 5 OSS's, all with identical hardware. They are all on the same network segment, and have a single ethernet interface. I'm running SLES9 SP3 with the Lustre RPMs for the kernel, modules, etc.

I've configured the systems and mounted everything, and everything seems fine.

As a first test, I've tried to mount the filesystem on the MDS (and on more than one OSS) as a client. The filesystem seems to mount fine, but once it is mounted, which ever system has it mounted will hang for long periods of time (often, permanently), however, I can log into the system from another shell, and things will act ok. Normally the hang seems to be caused by doing anything related to the client-mounted filesystem, but not always.

I had an installation of 1.6 beta7 working that didn't seem to have the problem, but 1.6.0 and 1.6.0.1 both have done it.

I currently have the filesystem mounted using the MDS as a client, it has created a nearly 1MB lustre-log in /tmp (available upon request), and I've included a snippet from /var/log/messages below.

Any help would be appreciated!

May 9 15:13:46 Lustre-01-01 kernel: Lustre: 7256:0:(ldlm_lib.c:497:target_handle_reconnect()) lustre1-MDT0000: d09242ed-b4f7-806f-bc12-912f7cfac1a9 reconnecting May 9 15:13:46 Lustre-01-01 kernel: Lustre: 7256:0:(ldlm_lib.c:497:target_handle_reconnect()) Skipped 1 previous similar message May 9 15:13:46 Lustre-01-01 kernel: Lustre: 7256:0:(ldlm_lib.c:709:target_handle_connect()) lustre1-MDT0000: refuse reconnection from [EMAIL PROTECTED]@lo to 0x000001011f228000/2 May 9 15:13:46 Lustre-01-01 kernel: Lustre: 7256:0:(ldlm_lib.c:709:target_handle_connect()) Skipped 1 previous similar message May 9 15:13:46 Lustre-01-01 kernel: LustreError: 7256:0:(ldlm_lib.c:1363:target_send_reply_msg()) @@@ processing error (-16) [EMAIL PROTECTED] x670/t0 o38->[EMAIL PROTECTED]@tcp:-1 lens 304/200 ref 0 fl Interpret:/0/0 rc -16/0 May 9 15:13:46 Lustre-01-01 kernel: LustreError: 7256:0:(ldlm_lib.c:1363:target_send_reply_msg()) Skipped 1 previous similar message May 9 15:14:54 Lustre-01-01 automount[6101]: attempting to mount entry /.autofs/var.mail May 9 15:15:01 Lustre-01-01 kernel: Lustre: 7262:0:(ldlm_lib.c:497:target_handle_reconnect()) lustre1-MDT0000: d09242ed-b4f7-806f-bc12-912f7cfac1a9 reconnecting May 9 15:15:01 Lustre-01-01 kernel: Lustre: 7262:0:(ldlm_lib.c:497:target_handle_reconnect()) Skipped 2 previous similar messages May 9 15:15:01 Lustre-01-01 kernel: Lustre: 7262:0:(ldlm_lib.c:709:target_handle_connect()) lustre1-MDT0000: refuse reconnection from [EMAIL PROTECTED]@lo to 0x000001011f228000/2 May 9 15:15:01 Lustre-01-01 kernel: Lustre: 7262:0:(ldlm_lib.c:709:target_handle_connect()) Skipped 2 previous similar messages May 9 15:15:01 Lustre-01-01 kernel: LustreError: 7262:0:(ldlm_lib.c:1363:target_send_reply_msg()) @@@ processing error (-16) [EMAIL PROTECTED] x712/t0 o38->[EMAIL PROTECTED]@tcp:-1 lens 304/200 ref 0 fl Interpret:/0/0 rc -16/0 May 9 15:15:01 Lustre-01-01 kernel: LustreError: 7262:0:(ldlm_lib.c:1363:target_send_reply_msg()) Skipped 2 previous similar messages May 9 15:17:31 Lustre-01-01 kernel: Lustre: 7242:0:(ldlm_lib.c:497:target_handle_reconnect()) lustre1-MDT0000: d09242ed-b4f7-806f-bc12-912f7cfac1a9 reconnecting May 9 15:17:31 Lustre-01-01 kernel: Lustre: 7242:0:(ldlm_lib.c:497:target_handle_reconnect()) Skipped 5 previous similar messages May 9 15:17:31 Lustre-01-01 kernel: Lustre: 7242:0:(ldlm_lib.c:709:target_handle_connect()) lustre1-MDT0000: refuse reconnection from [EMAIL PROTECTED]@lo to 0x000001011f228000/2 May 9 15:17:31 Lustre-01-01 kernel: Lustre: 7242:0:(ldlm_lib.c:709:target_handle_connect()) Skipped 5 previous similar messages May 9 15:17:31 Lustre-01-01 kernel: LustreError: 7242:0:(ldlm_lib.c:1363:target_send_reply_msg()) @@@ processing error (-16) [EMAIL PROTECTED] x796/t0 o38->[EMAIL PROTECTED]@tcp:-1 lens 304/200 ref 0 fl Interpret:/0/0 rc -16/0 May 9 15:17:31 Lustre-01-01 kernel: LustreError: 7242:0:(ldlm_lib.c:1363:target_send_reply_msg()) Skipped 5 previous similar messages
May  9 15:20:17 Lustre-01-01 automount[7452]: expired /.autofs/var.mail
May 9 15:20:26 Lustre-01-01 kernel: LustreError: 7143:0:(client.c:574:ptlrpc_check_status()) @@@ type == PTL_RPC_MSG_ERR, err == -19 [EMAIL PROTECTED] x891/t0 o8->[EMAIL PROTECTED]@tcp:6 lens 240/272 ref 1 fl Rpc:R/0/0 rc 0/-19 May 9 15:20:26 Lustre-01-01 kernel: LustreError: 7143:0:(client.c:574:ptlrpc_check_status()) Skipped 72 previous similar messages May 9 15:22:05 Lustre-01-01 kernel: Lustre: 7233:0:(ldlm_lib.c:497:target_handle_reconnect()) lustre1-MDT0000: d09242ed-b4f7-806f-bc12-912f7cfac1a9 reconnecting May 9 15:22:05 Lustre-01-01 kernel: Lustre: 7233:0:(ldlm_lib.c:497:target_handle_reconnect()) Skipped 10 previous similar messages May 9 15:22:05 Lustre-01-01 kernel: Lustre: 7233:0:(ldlm_lib.c:709:target_handle_connect()) lustre1-MDT0000: refuse reconnection from [EMAIL PROTECTED]@lo to 0x000001011f228000/2 May 9 15:22:05 Lustre-01-01 kernel: Lustre: 7233:0:(ldlm_lib.c:709:target_handle_connect()) Skipped 10 previous similar messages May 9 15:22:05 Lustre-01-01 kernel: LustreError: 7233:0:(ldlm_lib.c:1363:target_send_reply_msg()) @@@ processing error (-16) [EMAIL PROTECTED] x950/t0 o38->[EMAIL PROTECTED]@tcp:-1 lens 304/200 ref 0 fl Interpret:/0/0 rc -16/0 May 9 15:22:05 Lustre-01-01 kernel: LustreError: 7233:0:(ldlm_lib.c:1363:target_send_reply_msg()) Skipped 10 previous similar messages May 9 15:25:11 Lustre-01-01 kernel: LustreError: 7406:0:(mdc_locks.c:414:mdc_enqueue()) ldlm_cli_enqueue: -4


--
Roger L. Smith
Senior Systems Administrator
Mississippi State University
High Performance Computing Collaboratory

_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Reply via email to