The node name given in the XML file is used only to identify what
services to set up when you run lconf. The --nid entry is what
determines the LNET network address that servers/clients will try to
communicate with each other on.
As a convenience, you can use the hostname in the nid, and lconf will
resolve it to its ipaddr.
${LMC} -m $CONFIG --add net --node uml1 --nid [EMAIL PROTECTED] --nettype lnet
The nids for a node are determined by LNET from the networks option in
modprobe.conf, not lconf.
But of course, the nids in the XML must match these nids for any
communication to succeed.
The way to think about it is that lconf determines who to talk to
(remote identities), but modprobe.conf determines
the local identity.
BTW, the confusion brought on by trying to configure these aspects of
Lustre has led to a major overhaul of
the configuration system, now called MountConf and debuting in Lustre 1.6.0
https://mail.clusterfs.com/wikis/lustre/MountConf
Jim Albin wrote:
Hi, thanks for the response. Yes I was able to get it working and using
the ethernet interface I want it to. Have I interpreted the problem
correctly, it tries to use any or all ethernet interfaces configured
regardless of the node or IP address in the XML file? And if the
loopback does not work given this circumstance then the single system
test instructions are just outdated? I appreciate your help.
Jim Albin
On Tue, 2006-11-28 at 17:04 -0800, Nathaniel Rutman wrote:
Sorry, I was travelling today - I saw you got help on the list.
Yes, we need to update our docs, and we are working on that.
Jim Albin wrote:
Hi again,
I used the node name that maps to the interface for eth3 and it works
now. Not sure but it appears the nid is getting mapped to eth3 so using
that interface for the single system test seems to work.
thanks again.
Jim Albin
On Tue, 2006-11-28 at 09:16 -0700, Jim Albin wrote:
Good morning Nathaniel.
"lctl list_nids" shows this
# lctl list_nids
[EMAIL PROTECTED]
which is the ip address for eth3; which is not mapped to either
localhost or the interface of the hostname (head4 = eth1 =
172.16.100.4). My conclusion is that the localhost single system test
doesn't work as described, it is mapping the nid to interfaces
regardless of the node name in the xml file. I also found that if I did
not add the "--node localhost" to the lconf --reformat line it will
complain "No host entry" and stop. (instead of also trying the localhost
as described in the installation manual).
The "New Schema" section describes the LNET concept and mentions that it
will attempt to use all available interfaces but I don't see any more
advice on how to configure the single system test for a specific
interface. I will try using ip addresses next.
Thanks for taking the time to respond.
Jim Albin
On Mon, 2006-11-27 at 15:12 -0800, Nathaniel Rutman wrote:
Use "lctl list_nids" to show the local nids after starting lnet (lctl
network up). Use one of those in the config.
You could "ping head4" to see what it resolves to.
Jim Albin wrote:
Hi Nathaniel,
I followed the steps in the installation manual section 2.3.1
(LustreManual.html) distributed with it and I tried cutting and pasting
the local.sh script from the wiki-howto (Using Supplied Configuration
Tools section)
(https://mail.clusterfs.com/wikis/lustre/LustreHowto).
My /etc/hosts file shows this for localhost:
# grep localhost /etc/hosts
127.0.0.1 localhost.localdomain localhost
# hostname
head4
( so my hostname does NOT map to 127.0.0.1)
I'm wondering if it has something to do with the ethernet interfaces
(there are 4) and my default route is set for one of them;
#netstat -rn
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt
Iface
192.174.32.0 0.0.0.0 255.255.255.0 U 0 0 0
eth3
172.18.0.0 0.0.0.0 255.255.0.0 U 0 0 0
eth2
172.16.0.0 0.0.0.0 255.255.0.0 U 0 0 0
eth1
169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0
eth3
0.0.0.0 192.174.32.26 0.0.0.0 UG 0 0 0
eth3
This line (from the syslog snippet below) shows the ip address of eth3
(socklnd_cb.c:396:ksocknal_txlist_done()) Deleting packet type 1 len
240
[EMAIL PROTECTED]>[EMAIL PROTECTED]
Thanks for looking at it, I can resend the local.sh and/or local.xml if
that would help. I suspect this is trivial and I may be able to set up a
multiple node test but wanted to try and get this working first.
Jim Albin
On Mon, 2006-11-27 at 13:28 -0800, Nathaniel Rutman wrote:
Yikes, I hope not.
From the LustreHowTo https://mail.clusterfs.com/wikis/lustre/LustreHowto
"One common problem with some Linux setups is that the hostname is
mapped in /etc/hosts to 127.0.0.1, which causes the clients to be unable
to communicate to the servers."
Where are you looking?
Jim Albin wrote:
Then the documentation and Wiki are incorrect?
On Mon, 2006-11-27 at 12:37 -0800, Nathaniel Rutman wrote:
You can't use the loopback IP addr as a nid.
Use something that does not resolve to 127.0.0.1
Jim Albin wrote:
I'm trying to install and test lustre 1.4.7 on a new system and am
failing to start the single system test. I've installed the kernel and
the libcfs module. I copied the steps for creating a local.xml file
using the local.sh script on this page;
https://mail.clusterfs.com/wikis/lustre/LustreHowto
This seems to work correctly and generates the local.xml file.
When I try to start it with "lconf -v --node localhost --reformat
local.xml
I get this output to the screen, then it hangs until I ^C out of it. The
messages below are logged in the system log file.
--- <snip> bottom of output from lconf command ----
+ /usr/sbin/lctl
cfg_device MDC_head4_mds-test_MNT_localhost
setup mds-test_UUID localhost_UUID
quit
MTPT: MNT_localhost MNT_localhost_UUID /mnt/lustre mds-test_UUID lov-
test_UUID
+ mkdir /mnt/lustre
+ mount -t lustre_lite -o osc=lov-test,mdc=MDC_head4_mds-
test_MNT_localhost local /mnt/lustre
Traceback (most recent call last):
File "/usr/sbin/lconf", line 2852, in ?
main()
File "/usr/sbin/lconf", line 2845, in main
doHost(lustreDB, node_list)
File "/usr/sbin/lconf", line 2288, in doHost
for_each_profile(node_db, prof_list, doSetup)
File "/usr/sbin/lconf", line 2068, in for_each_profile
operation(services)
File "/usr/sbin/lconf", line 2088, in doSetup
n.prepare()
File "/usr/sbin/lconf", line 1899, in prepare
ret, val = run(cmd)
File "/usr/sbin/lconf", line 530, in run
return runcmd(cmd)
File "/usr/sbin/lconf", line 520, in runcmd
out = f.readlines()
KeyboardInterrupt
----- /var/log/messages contents ----
Nov 27 11:53:43 head4 kernel: kjournald starting. Commit interval 5
seconds
Nov 27 11:53:43 head4 kernel: LDISKFS FS on loop2, internal journal
Nov 27 11:53:43 head4 kernel: LDISKFS-fs: mounted filesystem with
ordered data mode.
Nov 27 11:53:43 head4 kernel: Lustre: Changing connection for
OSC_head4_ost1-test_mds-test to localhost_UUID/[EMAIL PROTECTED]
Nov 27 11:53:43 head4 kernel: LustreError: Refusing connection from
127.0.0.1 for [EMAIL PROTECTED]: No matching NI
Nov 27 11:53:43 head4 kernel: LustreError: 17622:0:
(socklnd_cb.c:1472:ksocknal_recv_hello()) Error -104 reading HELLO from
127.0.0.1
Nov 27 11:53:43 head4 kernel: LustreError: Connection to [EMAIL PROTECTED]
at host 127.0.0.1 on port 988 was reset: is it running a compatible
version of Lustre and is [EMAIL PROTECTED] one of its NIDs?
Nov 27 11:53:43 head4 kernel: LustreError: 17622:0:
(socklnd_cb.c:396:ksocknal_txlist_done()) Deleting packet type 1 len 240
[EMAIL PROTECTED]>[EMAIL PROTECTED]
Nov 27 11:53:43 head4 kernel: LustreError: 17622:0:
(events.c:53:request_out_callback()) @@@ type 4, status -5
[EMAIL PROTECTED] x1/t0 o8->[EMAIL PROTECTED]:6 lens
240/272 ref 2 fl Rpc:/0/0 rc 0/0
Nov 27 11:53:43 head4 kernel: LustreError: 18648:0:
(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at
1164653623, 0s ago) [EMAIL PROTECTED] x1/t0 o8->ost1-
[EMAIL PROTECTED]:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0
And finally, the RPC timeout messages continue to log into the system
log every 30 seconds or so until I reboot. Can someone see something I'm
missing or has worked through this problem? I wonder if the installation
manual & howto wiki is missing something or assuming something that is
different in my setup. I run nmap localhost and see that port 988/tcp is
open. Thanks in advance for any help.
------------------------------------------------------------------------
_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss