Kevin L. Buterbaugh wrote:
#
# Configure networking
#
lmc -m config.xml --add net --node lustrem --nid lustrem --nettype tcp
lmc -m config.xml --add net --node lustre1 --nid lustre1 --nettype tcp
lmc -m config.xml --add net --node lustre2 --nid lustre2 --nettype tcp

And from the MDS (lustrem):

3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 1170442057, 5s ago) [EMAIL PROTECTED] x1/t0 o8->[EMAIL PROTECTED]:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0 Feb 2 12:48:07 lustrem kernel: LustreError: 3894:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 1170442082, 5s ago) [EMAIL PROTECTED] x4/t0 o8->[EMAIL PROTECTED]:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0 Feb 2 12:48:07 lustrem kernel: LustreError:

These messages indicate failure to connect to the OSTs (op 8 = OST_CONNECT).

There's always a possibility that one or more of your nodes doesn't resolve the hostname properly. In general, I recommend using the actual IP address: lmc -m config.xml --add net --node lustrem --nid [EMAIL PROTECTED] --nettype lnet

Also, check that every node can ping every other.  On each node:
modprobe lnet
lctl network up
lctl list_nids
Then on each node:
lctl ping <nids from other nodes>
lctl network down (so you'll be able to remove the module)

_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Reply via email to