The problem is solved in a very weird way. I found that when I umount ost temporary and remount it again. The client mount just come back to work again. Now every nodes can see the file system without problem.

lctl ping seems to work ok on every nodes (I didn't test every possibility. But all few tests are success).

Nathaniel Rutman wrote:
Use "lctl list_nids" and "lctl ping <remote_nid>" on the clients and servers to help see where the problem is.
Somsak Sriprayoonsakul wrote:
Dear List,

I'm trying to set up a Lustre 1.6b5 cluster where every nodes except frontend serve OST, frontend serve MGS+MDT, and every nodes (including frontend) mount and use Lustre. Somehow there's a weird problem where some nodes can't mount lustre but some nodes can.

My configuration:

OS: Rocks 4.2.1 Cluster (CentOS 4.4) using stock lustre 2.6.9-42.EL_lustre.1.5.95smp kernel. Frontend has 2 IP (real + private) and ever compute nodes using private IP.
Lustre: 1.6b5.
Here's log from frontend (MGS+MDT) and the failed client node

Failed client node:

Lustre:   mount data:
Lustre: profile: lustre-client
Lustre: device:  [EMAIL PROTECTED]:/lustre
Lustre: flags:   2
LustreError: 22040:0:(client.c:579:ptlrpc_check_status()) @@@ type == PTL_RPC_MSG_ERR, err == -107 LustreError: 22040:0:(client.c:579:ptlrpc_check_status()) Skipped 3 previous similar messages LustreError: 22040:0:(mgc_request.c:964:mgc_process_log()) Can't get cfg lock: -107 LustreError: 3099:0:(mgc_request.c:493:mgc_blocking_ast()) original grant failed, won't requeue LustreError: 22040:0:(mgc_request.c:1014:mgc_process_log()) [EMAIL PROTECTED]: the configuration 'lustre-client' could not be read (-107) from the MGS. LustreError: [EMAIL PROTECTED]: The configuration 'lustre-client' could not be read from the MGS (-107). This may be the result of communication errors between this node and the MGS, or the MGS may not be running. Lustre: 0 UP mgc [EMAIL PROTECTED] f19e61f7-623f-55a2-6332-ea987600d10d 5
Lustre:   1 UP ost OSS OSS_uuid 3
Lustre:   2 UP obdfilter lustre-OST0001 lustre-OST0001_UUID 9
LustreError: 22040:0:(llite_lib.c:909:ll_fill_super()) Unable to process log: -107
Lustre: client 0000010118688000 umount complete
LustreError: 22040:0:(obd_mount.c:1857:lustre_fill_super()) Unable to mount (-107)


Frontend:
LustreError: 10490:0:(mgs_handler.c:468:mgs_handle()) lustre_mgs: operation 101 on unconnected MGS LustreError: 10490:0:(mgs_handler.c:468:mgs_handle()) Skipped 1 previous similar message LustreError: 10490:0:(ldlm_lib.c:1317:target_send_reply_msg()) @@@ processing error (-107) LustreError: 10490:0:(ldlm_lib.c:1317:target_send_reply_msg()) Skipped 3 previous similar messages

I think I strictly follow the guide at https://mail.clusterfs.com/wikis/lustre/MountConf. I suppose that the problem occurred because IP confusion on Frontend. But some compute nodes successfully mount lustre file system frontend.

Regards,



_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Reply via email to