The problem is solved in a very weird way. I found that when I umount
ost temporary and remount it again. The client mount just come back to
work again. Now every nodes can see the file system without problem.
lctl ping seems to work ok on every nodes (I didn't test every
possibility. But all few tests are success).
Nathaniel Rutman wrote:
Use "lctl list_nids" and "lctl ping <remote_nid>" on the clients and
servers to help see where the problem is.
Somsak Sriprayoonsakul wrote:
Dear List,
I'm trying to set up a Lustre 1.6b5 cluster where every nodes
except frontend serve OST, frontend serve MGS+MDT, and every nodes
(including frontend) mount and use Lustre. Somehow there's a weird
problem where some nodes can't mount lustre but some nodes can.
My configuration:
OS: Rocks 4.2.1 Cluster (CentOS 4.4) using stock lustre
2.6.9-42.EL_lustre.1.5.95smp kernel. Frontend has 2 IP (real +
private) and ever compute nodes using private IP.
Lustre: 1.6b5.
Here's log from frontend (MGS+MDT) and the failed client node
Failed client node:
Lustre: mount data:
Lustre: profile: lustre-client
Lustre: device: [EMAIL PROTECTED]:/lustre
Lustre: flags: 2
LustreError: 22040:0:(client.c:579:ptlrpc_check_status()) @@@ type ==
PTL_RPC_MSG_ERR, err == -107
LustreError: 22040:0:(client.c:579:ptlrpc_check_status()) Skipped 3
previous similar messages
LustreError: 22040:0:(mgc_request.c:964:mgc_process_log()) Can't get
cfg lock: -107
LustreError: 3099:0:(mgc_request.c:493:mgc_blocking_ast()) original
grant failed, won't requeue
LustreError: 22040:0:(mgc_request.c:1014:mgc_process_log())
[EMAIL PROTECTED]: the configuration 'lustre-client' could not be read
(-107) from the MGS.
LustreError: [EMAIL PROTECTED]: The configuration 'lustre-client' could
not be read from the MGS (-107). This may be the result of
communication errors between this node and the MGS, or the MGS may
not be running.
Lustre: 0 UP mgc [EMAIL PROTECTED]
f19e61f7-623f-55a2-6332-ea987600d10d 5
Lustre: 1 UP ost OSS OSS_uuid 3
Lustre: 2 UP obdfilter lustre-OST0001 lustre-OST0001_UUID 9
LustreError: 22040:0:(llite_lib.c:909:ll_fill_super()) Unable to
process log: -107
Lustre: client 0000010118688000 umount complete
LustreError: 22040:0:(obd_mount.c:1857:lustre_fill_super()) Unable to
mount (-107)
Frontend:
LustreError: 10490:0:(mgs_handler.c:468:mgs_handle()) lustre_mgs:
operation 101 on unconnected MGS
LustreError: 10490:0:(mgs_handler.c:468:mgs_handle()) Skipped 1
previous similar message
LustreError: 10490:0:(ldlm_lib.c:1317:target_send_reply_msg()) @@@
processing error (-107)
LustreError: 10490:0:(ldlm_lib.c:1317:target_send_reply_msg())
Skipped 3 previous similar messages
I think I strictly follow the guide at
https://mail.clusterfs.com/wikis/lustre/MountConf. I suppose that the
problem occurred because IP confusion on Frontend. But some compute
nodes successfully mount lustre file system frontend.
Regards,
_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss