I'm attempting my first-ever Lustre install on a small test cluster. I
have one MDS and 5 OSS's, all with identical hardware. They are all on
the same network segment, and have a single ethernet interface. I'm
running SLES9 SP3 with the Lustre RPMs for the kernel, modules, etc.
I've configured the systems and mounted everything, and everything seems
fine.
As a first test, I've tried to mount the filesystem on the MDS (and on
more than one OSS) as a client. The filesystem seems to mount fine, but
once it is mounted, which ever system has it mounted will hang for long
periods of time (often, permanently), however, I can log into the system
from another shell, and things will act ok. Normally the hang seems to
be caused by doing anything related to the client-mounted filesystem,
but not always.
I had an installation of 1.6 beta7 working that didn't seem to have the
problem, but 1.6.0 and 1.6.0.1 both have done it.
I currently have the filesystem mounted using the MDS as a client, it
has created a nearly 1MB lustre-log in /tmp (available upon request),
and I've included a snippet from /var/log/messages below.
Any help would be appreciated!
May 9 15:13:46 Lustre-01-01 kernel: Lustre:
7256:0:(ldlm_lib.c:497:target_handle_reconnect()) lustre1-MDT0000:
d09242ed-b4f7-806f-bc12-912f7cfac1a9 reconnecting
May 9 15:13:46 Lustre-01-01 kernel: Lustre:
7256:0:(ldlm_lib.c:497:target_handle_reconnect()) Skipped 1 previous
similar message
May 9 15:13:46 Lustre-01-01 kernel: Lustre:
7256:0:(ldlm_lib.c:709:target_handle_connect()) lustre1-MDT0000: refuse
reconnection from [EMAIL PROTECTED]@lo to
0x000001011f228000/2
May 9 15:13:46 Lustre-01-01 kernel: Lustre:
7256:0:(ldlm_lib.c:709:target_handle_connect()) Skipped 1 previous
similar message
May 9 15:13:46 Lustre-01-01 kernel: LustreError:
7256:0:(ldlm_lib.c:1363:target_send_reply_msg()) @@@ processing error
(-16) [EMAIL PROTECTED] x670/t0
o38->[EMAIL PROTECTED]@tcp:-1 lens
304/200 ref 0 fl Interpret:/0/0 rc -16/0
May 9 15:13:46 Lustre-01-01 kernel: LustreError:
7256:0:(ldlm_lib.c:1363:target_send_reply_msg()) Skipped 1 previous
similar message
May 9 15:14:54 Lustre-01-01 automount[6101]: attempting to mount entry
/.autofs/var.mail
May 9 15:15:01 Lustre-01-01 kernel: Lustre:
7262:0:(ldlm_lib.c:497:target_handle_reconnect()) lustre1-MDT0000:
d09242ed-b4f7-806f-bc12-912f7cfac1a9 reconnecting
May 9 15:15:01 Lustre-01-01 kernel: Lustre:
7262:0:(ldlm_lib.c:497:target_handle_reconnect()) Skipped 2 previous
similar messages
May 9 15:15:01 Lustre-01-01 kernel: Lustre:
7262:0:(ldlm_lib.c:709:target_handle_connect()) lustre1-MDT0000: refuse
reconnection from [EMAIL PROTECTED]@lo to
0x000001011f228000/2
May 9 15:15:01 Lustre-01-01 kernel: Lustre:
7262:0:(ldlm_lib.c:709:target_handle_connect()) Skipped 2 previous
similar messages
May 9 15:15:01 Lustre-01-01 kernel: LustreError:
7262:0:(ldlm_lib.c:1363:target_send_reply_msg()) @@@ processing error
(-16) [EMAIL PROTECTED] x712/t0
o38->[EMAIL PROTECTED]@tcp:-1 lens
304/200 ref 0 fl Interpret:/0/0 rc -16/0
May 9 15:15:01 Lustre-01-01 kernel: LustreError:
7262:0:(ldlm_lib.c:1363:target_send_reply_msg()) Skipped 2 previous
similar messages
May 9 15:17:31 Lustre-01-01 kernel: Lustre:
7242:0:(ldlm_lib.c:497:target_handle_reconnect()) lustre1-MDT0000:
d09242ed-b4f7-806f-bc12-912f7cfac1a9 reconnecting
May 9 15:17:31 Lustre-01-01 kernel: Lustre:
7242:0:(ldlm_lib.c:497:target_handle_reconnect()) Skipped 5 previous
similar messages
May 9 15:17:31 Lustre-01-01 kernel: Lustre:
7242:0:(ldlm_lib.c:709:target_handle_connect()) lustre1-MDT0000: refuse
reconnection from [EMAIL PROTECTED]@lo to
0x000001011f228000/2
May 9 15:17:31 Lustre-01-01 kernel: Lustre:
7242:0:(ldlm_lib.c:709:target_handle_connect()) Skipped 5 previous
similar messages
May 9 15:17:31 Lustre-01-01 kernel: LustreError:
7242:0:(ldlm_lib.c:1363:target_send_reply_msg()) @@@ processing error
(-16) [EMAIL PROTECTED] x796/t0
o38->[EMAIL PROTECTED]@tcp:-1 lens
304/200 ref 0 fl Interpret:/0/0 rc -16/0
May 9 15:17:31 Lustre-01-01 kernel: LustreError:
7242:0:(ldlm_lib.c:1363:target_send_reply_msg()) Skipped 5 previous
similar messages
May 9 15:20:17 Lustre-01-01 automount[7452]: expired /.autofs/var.mail
May 9 15:20:26 Lustre-01-01 kernel: LustreError:
7143:0:(client.c:574:ptlrpc_check_status()) @@@ type == PTL_RPC_MSG_ERR,
err == -19 [EMAIL PROTECTED] x891/t0
o8->[EMAIL PROTECTED]@tcp:6 lens 240/272 ref 1 fl
Rpc:R/0/0 rc 0/-19
May 9 15:20:26 Lustre-01-01 kernel: LustreError:
7143:0:(client.c:574:ptlrpc_check_status()) Skipped 72 previous similar
messages
May 9 15:22:05 Lustre-01-01 kernel: Lustre:
7233:0:(ldlm_lib.c:497:target_handle_reconnect()) lustre1-MDT0000:
d09242ed-b4f7-806f-bc12-912f7cfac1a9 reconnecting
May 9 15:22:05 Lustre-01-01 kernel: Lustre:
7233:0:(ldlm_lib.c:497:target_handle_reconnect()) Skipped 10 previous
similar messages
May 9 15:22:05 Lustre-01-01 kernel: Lustre:
7233:0:(ldlm_lib.c:709:target_handle_connect()) lustre1-MDT0000: refuse
reconnection from [EMAIL PROTECTED]@lo to
0x000001011f228000/2
May 9 15:22:05 Lustre-01-01 kernel: Lustre:
7233:0:(ldlm_lib.c:709:target_handle_connect()) Skipped 10 previous
similar messages
May 9 15:22:05 Lustre-01-01 kernel: LustreError:
7233:0:(ldlm_lib.c:1363:target_send_reply_msg()) @@@ processing error
(-16) [EMAIL PROTECTED] x950/t0
o38->[EMAIL PROTECTED]@tcp:-1 lens
304/200 ref 0 fl Interpret:/0/0 rc -16/0
May 9 15:22:05 Lustre-01-01 kernel: LustreError:
7233:0:(ldlm_lib.c:1363:target_send_reply_msg()) Skipped 10 previous
similar messages
May 9 15:25:11 Lustre-01-01 kernel: LustreError:
7406:0:(mdc_locks.c:414:mdc_enqueue()) ldlm_cli_enqueue: -4
--
Roger L. Smith
Senior Systems Administrator
Mississippi State University
High Performance Computing Collaboratory
_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss