Kevin L. Buterbaugh wrote:
Nathan,

I made the changes you suggested to my config.sh and regenerated the config.xml. I did the test you suggested on the MDS and the client and here's what I get:

[EMAIL PROTECTED] lustre]# modprobe lnet
[EMAIL PROTECTED] lustre]# lctl network up
LNET configured
[EMAIL PROTECTED] lustre]# lctl list_nids
[EMAIL PROTECTED]
[EMAIL PROTECTED] lustre]# lctl ping [EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED] lustre]#

[EMAIL PROTECTED] ~]# modprobe lnet
[EMAIL PROTECTED] ~]# lctl network up
LNET configured
[EMAIL PROTECTED] ~]# lctl list_nids
[EMAIL PROTECTED]
[EMAIL PROTECTED] ~]# lctl ping [EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED] ~]#


That's not the kind of output I expect from a ping, so I don't even know if that means it worked or not...

Yes, that's good. It means that the nodes can talk to each other through LNET using those nids. You need to do that between the OST and and MDT also, and between client and OST just for good measure. If they all can see each other, then we'll have to see the OST syslog to see why it's refusing to talk to the MDT.

PS please keep the discussion on the list -- it might be useful for the next guy.



Kevin

Nathaniel Rutman wrote:
Kevin L. Buterbaugh wrote:
#
# Configure networking
#
lmc -m config.xml --add net --node lustrem --nid lustrem --nettype tcp
lmc -m config.xml --add net --node lustre1 --nid lustre1 --nettype tcp
lmc -m config.xml --add net --node lustre2 --nid lustre2 --nettype tcp

And from the MDS (lustrem):

3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 1170442057, 5s ago) [EMAIL PROTECTED] x1/t0 o8->[EMAIL PROTECTED]:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0 Feb 2 12:48:07 lustrem kernel: LustreError: 3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 1170442082, 5s ago) [EMAIL PROTECTED] x4/t0 o8->[EMAIL PROTECTED]:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0 Feb 2 12:48:07 lustrem kernel: LustreError:

These messages indicate failure to connect to the OSTs (op 8 = OST_CONNECT).

There's always a possibility that one or more of your nodes doesn't resolve the hostname properly. In general, I recommend using the actual IP address: lmc -m config.xml --add net --node lustrem --nid [EMAIL PROTECTED] --nettype lnet

Also, check that every node can ping every other.  On each node:
modprobe lnet
lctl network up
lctl list_nids
Then on each node:
lctl ping <nids from other nodes>
lctl network down (so you'll be able to remove the module)



_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Reply via email to