Nathan,

Thanks for the help. That solved one problem, but after booting all of the servers (no clients at all), I'm getting this in the syslog on the MDS:



May 11 16:57:11 Lustre-01-01 kernel: LustreError: 6359:0:(client.c:574:ptlrpc_check_status()) @@@ type == PTL_RPC_MSG_ERR, err == -19 [EMAIL PROTECTED] x1450/t0 o8->[EMAIL PROTECTED]@tcp:6 lens 240/272 ref 1 fl Rpc:R/0/0 rc 0/-19 May 11 16:57:11 Lustre-01-01 kernel: LustreError: 6359:0:(client.c:574:ptlrpc_check_status()) Skipped 24 previous similar messages


IF I then attempt to mount the filesystem on the MDS, it mounts, and I can see the contents. I then tried removing a file that I had earlier created using the "touch" command, and it removes fine, however, if I then try to touch a new file, the command hangs and after a minute or so I get the following set of errors: (I'll be happy to provide the lustre-log if that's helpful).

May 11 17:01:38 Lustre-01-01 kernel: Lustre: Client lustre1-client has started May 11 17:03:32 Lustre-01-01 kernel: Lustre: 0:0:(watchdog.c:130:lcw_cb()) Watch
dog triggered for pid 6617: it was inactive for 100s
May 11 17:03:32 Lustre-01-01 kernel: Lustre: 0:0:(linux-debug.c:166:libcfs_debug
_dumpstack()) showing stack for process 6617
May 11 17:03:32 Lustre-01-01 kernel: ll_mdt_10 S 00000102179ac4a8 0 661
7      1          6618  6616 (L-TLB)
May 11 17:03:32 Lustre-01-01 kernel: 000001021770b018 0000000000000046 000000300
0000030 000001021770b0b0
May 11 17:03:32 Lustre-01-01 kernel: 000001021770af98 0000010215f28380 00
0001020000007b 0000000000000000
May 11 17:03:32 Lustre-01-01 kernel: 0000000000000000 000001000c001160 May 11 17:03:32 Lustre-01-01 kernel: Call Trace:<ffffffff80147456>{schedule_time
out+246} <ffffffff801468b0>{process_timeout+0}
May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa03a776e>{:ptlrpc:ptlrpc_se
t_wait+974} <ffffffffa06040d4>{:osc:osc_statfs_async+372}
May 11 17:03:32 Lustre-01-01 kernel: <ffffffff80136120>{default_wake_func
tion+0} <ffffffffa04734fa>{:lov:lov_statfs_async+1098}
May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa03a67e0>{:ptlrpc:ptlrpc_ex
pired_set+0} <ffffffffa03a1680>{:ptlrpc:ptlrpc_interrupted_set+0}
May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa03a67e0>{:ptlrpc:ptlrpc_ex
pired_set+0} <ffffffffa03a1680>{:ptlrpc:ptlrpc_interrupted_set+0}
May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa047af8e>{:lov:lov_create+7
006} <ffffffff80190bdf>{__getblk+31}
May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa0525e1f>{:ldiskfs:ldiskfs_
get_inode_loc+351}
May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa0533486>{:ldiskfs:ldiskfs_
xattr_ibody_get+454}
May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa0533e18>{:ldiskfs:ldiskfs_
xattr_get+120}
May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa05aa4da>{:mds:mds_get_md+1
06} <ffffffffa05ccd7e>{:mds:mds_create_objects+7214}
May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa0533e18>{:ldiskfs:ldiskfs_
xattr_get+120}
May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa058cb6d>{:fsfilt_ldiskfs:f
sfilt_ldiskfs_get_md+269}
May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa05d0b9d>{:mds:mds_finish_o
pen+701} <ffffffffa05d3617>{:mds:mds_open+8359}
May 11 17:03:32 Lustre-01-01 kernel: <ffffffff8014ca1b>{groups_alloc+59}
<ffffffffa029d653>{:lvfs:entry_set_group_info+211}
May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa029cb71>{:lvfs:alloc_entry
+241} <ffffffffa028570d>{:libcfs:libcfs_debug_vmsg2+1677}
May 11 17:03:32 Lustre-01-01 kernel: <ffffffff801a9361>{dput+33} <fffffff
fa03b1e60>{:ptlrpc:lustre_swab_mds_rec_create+0}
May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa05b35fc>{:mds:mds_reint_re
c+460} <ffffffffa05d4fe4>{:mds:mds_open_unpack+820}
May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa05d42d4>{:mds:mds_update_u
npack+484} <ffffffffa05aa391>{:mds:mds_reint+817}
May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa037d887>{:ptlrpc:_ldlm_loc
k_debug+1319} <ffffffffa05a86a1>{:mds:fixup_handle_for_resent_req+81}
May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa05ae771>{:mds:mds_intent_p
olicy+1089} <ffffffff801168e5>{do_gettimeofday+101}
May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa0382e63>{:ptlrpc:ldlm_lock
_enqueue+243} <ffffffffa039dc70>{:ptlrpc:ldlm_server_completion_ast+0}
May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa039c972>{:ptlrpc:ldlm_hand
le_enqueue+2722}
May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa039e100>{:ptlrpc:ldlm_serv
er_blocking_ast+0}
May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa039dc70>{:ptlrpc:ldlm_server_completion_ast+0} May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa05b291a>{:mds:mds_handle+1
4938} <ffffffffa0306e4f>{:obdclass:class_handle2object+207}
May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa03aec30>{:ptlrpc:lustre_sw
ab_ptlrpc_body+0}
May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa03b3300>{:ptlrpc:lustre_sw
ab_buf+208} <ffffffffa028247d>{:libcfs:libcfs_nid2str+189}
May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa03b9e27>{:ptlrpc:ptlrpc_se
rver_handle_request+2951}
May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa03bc4a8>{:ptlrpc:ptlrpc_ma
in+2232} <ffffffff80136120>{default_wake_function+0}
May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa03b7590>{:ptlrpc:ptlrpc_re
try_rqbds+0} <ffffffffa03b7590>{:ptlrpc:ptlrpc_retry_rqbds+0}
May 11 17:03:32 Lustre-01-01 kernel: <ffffffff8011126f>{child_rip+8} <fff
fffffa03bbbf0>{:ptlrpc:ptlrpc_main+0}
May 11 17:03:32 Lustre-01-01 kernel:        <ffffffff80111267>{child_rip+0}
May 11 17:03:32 Lustre-01-01 kernel: LustreError: dumping log to /tmp/lustre-log
.1178921012.6617
May 11 17:03:32 Lustre-01-01 kernel: LustreError: 7460:0:(client.c:950:ptlrpc_ex pire_one_request()) @@@ timeout (sent at 1178920912, 100s ago) [EMAIL PROTECTED] 400 x1548/t0 o101->[EMAIL PROTECTED]@tcp:12 lens 512/864 ref 1 fl
 Rpc:P/0/0 rc 0/-22
May 11 17:03:32 Lustre-01-01 kernel: LustreError: 7460:0:(client.c:950:ptlrpc_ex
pire_one_request()) Skipped 17 previous similar messages
May 11 17:03:32 Lustre-01-01 kernel: Lustre: lustre1-MDT0000-mdc-000001000c22440 0: Connection to service lustre1-MDT0000 via nid [EMAIL PROTECTED] was lost; in progress oper
ations using this service will wait for recovery to complete.
May 11 17:03:32 Lustre-01-01 kernel: Lustre: 6621:0:(ldlm_lib.c:497:target_handl e_reconnect()) lustre1-MDT0000: a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224 reconnectin
g
May 11 17:03:32 Lustre-01-01 kernel: Lustre: 6621:0:(ldlm_lib.c:709:target_handl e_connect()) lustre1-MDT0000: refuse reconnection from a5a32bfb-05ba-d0cc-f4dd-6
[EMAIL PROTECTED]@lo to 0x000001011df5b000/2
May 11 17:03:32 Lustre-01-01 kernel: LustreError: 6621:0:(ldlm_lib.c:1363:target _send_reply_msg()) @@@ processing error (-16) [EMAIL PROTECTED] x1607/t0 o38- >[EMAIL PROTECTED]@tcp:-1 lens 304/200 ref 0 fl I
nterpret:/0/0 rc -16/0
May 11 17:03:57 Lustre-01-01 kernel: Lustre: 6622:0:(ldlm_lib.c:497:target_handle_reconnect()) lustre1-MDT0000: a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224 reconnecting May 11 17:03:57 Lustre-01-01 kernel: Lustre: 6622:0:(ldlm_lib.c:709:target_handle_connect()) lustre1-MDT0000: refuse reconnection from [EMAIL PROTECTED]@lo to 0x000001011df5b000/2 May 11 17:03:57 Lustre-01-01 kernel: LustreError: 6622:0:(ldlm_lib.c:1363:target_send_reply_msg()) @@@ processing error (-16) [EMAIL PROTECTED] x1615/t0 o38->[EMAIL PROTECTED]@tcp:-1 lens 304/200 ref 0 fl Interpret:/0/0 rc -16/0 May 11 17:04:22 Lustre-01-01 kernel: Lustre: 6623:0:(ldlm_lib.c:497:target_handle_reconnect()) lustre1-MDT0000: a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224 reconnecting May 11 17:04:22 Lustre-01-01 kernel: Lustre: 6623:0:(ldlm_lib.c:709:target_handle_connect()) lustre1-MDT0000: refuse reconnection from [EMAIL PROTECTED]@lo to 0x000001011df5b000/2 May 11 17:04:22 Lustre-01-01 kernel: LustreError: 6623:0:(ldlm_lib.c:1363:target_send_reply_msg()) @@@ processing error (-16) [EMAIL PROTECTED] x1629/t0 o38->[EMAIL PROTECTED]@tcp:-1 lens 304/200 ref 0 fl Interpret:/0/0 rc -16/0 May 11 17:04:47 Lustre-01-01 kernel: Lustre: 6624:0:(ldlm_lib.c:497:target_handle_reconnect()) lustre1-MDT0000: a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224 reconnecting May 11 17:04:47 Lustre-01-01 kernel: Lustre: 6624:0:(ldlm_lib.c:709:target_handle_connect()) lustre1-MDT0000: refuse reconnection from [EMAIL PROTECTED]@lo to 0x000001011df5b000/2 May 11 17:04:47 Lustre-01-01 kernel: LustreError: 6624:0:(ldlm_lib.c:1363:target_send_reply_msg()) @@@ processing error (-16) [EMAIL PROTECTED] x1643/t0 o38->[EMAIL PROTECTED]@tcp:-1 lens 304/200 ref 0 fl Interpret:/0/0 rc -16/0


Nathaniel Rutman wrote:
May 9 15:13:46 Lustre-01-01 kernel: Lustre: 7256:0:(ldlm_lib.c:709:target_handle_connect()) lustre1-MDT0000: refuse reconnection from [EMAIL PROTECTED]@lo to 0x000001011f228000/2 The MDT for some reason thinks the export is in use. I have no idea why, but try this.

stop all your clients.
on the MDT:
# cat /proc/fs/lustre/devices
# ls /proc/fs/lustre/mds/lustre1-MDT0000/exports/
just to prove there are no clients.
Now try mounting a client on the MDT


Roger L. Smith wrote:

I'm attempting my first-ever Lustre install on a small test cluster. I have one MDS and 5 OSS's, all with identical hardware. They are all on the same network segment, and have a single ethernet interface. I'm running SLES9 SP3 with the Lustre RPMs for the kernel, modules, etc.

I've configured the systems and mounted everything, and everything seems fine.

As a first test, I've tried to mount the filesystem on the MDS (and on more than one OSS) as a client. The filesystem seems to mount fine, but once it is mounted, which ever system has it mounted will hang for long periods of time (often, permanently), however, I can log into the system from another shell, and things will act ok. Normally the hang seems to be caused by doing anything related to the client-mounted filesystem, but not always.

I had an installation of 1.6 beta7 working that didn't seem to have the problem, but 1.6.0 and 1.6.0.1 both have done it.

I currently have the filesystem mounted using the MDS as a client, it has created a nearly 1MB lustre-log in /tmp (available upon request), and I've included a snippet from /var/log/messages below.

Any help would be appreciated!

May 9 15:13:46 Lustre-01-01 kernel: Lustre: 7256:0:(ldlm_lib.c:497:target_handle_reconnect()) lustre1-MDT0000: d09242ed-b4f7-806f-bc12-912f7cfac1a9 reconnecting May 9 15:13:46 Lustre-01-01 kernel: Lustre: 7256:0:(ldlm_lib.c:497:target_handle_reconnect()) Skipped 1 previous similar message May 9 15:13:46 Lustre-01-01 kernel: Lustre: 7256:0:(ldlm_lib.c:709:target_handle_connect()) lustre1-MDT0000: refuse reconnection from [EMAIL PROTECTED]@lo to 0x000001011f228000/2 May 9 15:13:46 Lustre-01-01 kernel: Lustre: 7256:0:(ldlm_lib.c:709:target_handle_connect()) Skipped 1 previous similar message May 9 15:13:46 Lustre-01-01 kernel: LustreError: 7256:0:(ldlm_lib.c:1363:target_send_reply_msg()) @@@ processing error (-16) [EMAIL PROTECTED] x670/t0 o38->[EMAIL PROTECTED]@tcp:-1 lens 304/200 ref 0 fl Interpret:/0/0 rc -16/0 May 9 15:13:46 Lustre-01-01 kernel: LustreError: 7256:0:(ldlm_lib.c:1363:target_send_reply_msg()) Skipped 1 previous similar message May 9 15:14:54 Lustre-01-01 automount[6101]: attempting to mount entry /.autofs/var.mail May 9 15:15:01 Lustre-01-01 kernel: Lustre: 7262:0:(ldlm_lib.c:497:target_handle_reconnect()) lustre1-MDT0000: d09242ed-b4f7-806f-bc12-912f7cfac1a9 reconnecting May 9 15:15:01 Lustre-01-01 kernel: Lustre: 7262:0:(ldlm_lib.c:497:target_handle_reconnect()) Skipped 2 previous similar messages May 9 15:15:01 Lustre-01-01 kernel: Lustre: 7262:0:(ldlm_lib.c:709:target_handle_connect()) lustre1-MDT0000: refuse reconnection from [EMAIL PROTECTED]@lo to 0x000001011f228000/2 May 9 15:15:01 Lustre-01-01 kernel: Lustre: 7262:0:(ldlm_lib.c:709:target_handle_connect()) Skipped 2 previous similar messages May 9 15:15:01 Lustre-01-01 kernel: LustreError: 7262:0:(ldlm_lib.c:1363:target_send_reply_msg()) @@@ processing error (-16) [EMAIL PROTECTED] x712/t0 o38->[EMAIL PROTECTED]@tcp:-1 lens 304/200 ref 0 fl Interpret:/0/0 rc -16/0 May 9 15:15:01 Lustre-01-01 kernel: LustreError: 7262:0:(ldlm_lib.c:1363:target_send_reply_msg()) Skipped 2 previous similar messages May 9 15:17:31 Lustre-01-01 kernel: Lustre: 7242:0:(ldlm_lib.c:497:target_handle_reconnect()) lustre1-MDT0000: d09242ed-b4f7-806f-bc12-912f7cfac1a9 reconnecting May 9 15:17:31 Lustre-01-01 kernel: Lustre: 7242:0:(ldlm_lib.c:497:target_handle_reconnect()) Skipped 5 previous similar messages May 9 15:17:31 Lustre-01-01 kernel: Lustre: 7242:0:(ldlm_lib.c:709:target_handle_connect()) lustre1-MDT0000: refuse reconnection from [EMAIL PROTECTED]@lo to 0x000001011f228000/2 May 9 15:17:31 Lustre-01-01 kernel: Lustre: 7242:0:(ldlm_lib.c:709:target_handle_connect()) Skipped 5 previous similar messages May 9 15:17:31 Lustre-01-01 kernel: LustreError: 7242:0:(ldlm_lib.c:1363:target_send_reply_msg()) @@@ processing error (-16) [EMAIL PROTECTED] x796/t0 o38->[EMAIL PROTECTED]@tcp:-1 lens 304/200 ref 0 fl Interpret:/0/0 rc -16/0 May 9 15:17:31 Lustre-01-01 kernel: LustreError: 7242:0:(ldlm_lib.c:1363:target_send_reply_msg()) Skipped 5 previous similar messages
May  9 15:20:17 Lustre-01-01 automount[7452]: expired /.autofs/var.mail
May 9 15:20:26 Lustre-01-01 kernel: LustreError: 7143:0:(client.c:574:ptlrpc_check_status()) @@@ type == PTL_RPC_MSG_ERR, err == -19 [EMAIL PROTECTED] x891/t0 o8->[EMAIL PROTECTED]@tcp:6 lens 240/272 ref 1 fl Rpc:R/0/0 rc 0/-19 May 9 15:20:26 Lustre-01-01 kernel: LustreError: 7143:0:(client.c:574:ptlrpc_check_status()) Skipped 72 previous similar messages May 9 15:22:05 Lustre-01-01 kernel: Lustre: 7233:0:(ldlm_lib.c:497:target_handle_reconnect()) lustre1-MDT0000: d09242ed-b4f7-806f-bc12-912f7cfac1a9 reconnecting May 9 15:22:05 Lustre-01-01 kernel: Lustre: 7233:0:(ldlm_lib.c:497:target_handle_reconnect()) Skipped 10 previous similar messages May 9 15:22:05 Lustre-01-01 kernel: Lustre: 7233:0:(ldlm_lib.c:709:target_handle_connect()) lustre1-MDT0000: refuse reconnection from [EMAIL PROTECTED]@lo to 0x000001011f228000/2 May 9 15:22:05 Lustre-01-01 kernel: Lustre: 7233:0:(ldlm_lib.c:709:target_handle_connect()) Skipped 10 previous similar messages May 9 15:22:05 Lustre-01-01 kernel: LustreError: 7233:0:(ldlm_lib.c:1363:target_send_reply_msg()) @@@ processing error (-16) [EMAIL PROTECTED] x950/t0 o38->[EMAIL PROTECTED]@tcp:-1 lens 304/200 ref 0 fl Interpret:/0/0 rc -16/0 May 9 15:22:05 Lustre-01-01 kernel: LustreError: 7233:0:(ldlm_lib.c:1363:target_send_reply_msg()) Skipped 10 previous similar messages May 9 15:25:11 Lustre-01-01 kernel: LustreError: 7406:0:(mdc_locks.c:414:mdc_enqueue()) ldlm_cli_enqueue: -4



_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss


--
Roger L. Smith
Senior Systems Administrator
Mississippi State University
High Performance Computing Collaboratory

_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Reply via email to