You can't use the loopback IP addr as a nid.
Use something that does not resolve to 127.0.0.1

Jim Albin wrote:
I'm trying to install and test lustre 1.4.7 on a new system and am
failing to start the single system test. I've installed the kernel and
the libcfs module. I copied the steps for creating a local.xml file
using the local.sh script on this page;
https://mail.clusterfs.com/wikis/lustre/LustreHowto
This seems to work correctly and generates the local.xml file.

When I try to start it with "lconf -v --node localhost --reformat
local.xml
I get this output to the screen, then it hangs until I ^C out of it. The
messages below are logged in the system log file.
---  <snip> bottom of output from lconf command ----

+ /usr/sbin/lctl
  cfg_device MDC_head4_mds-test_MNT_localhost
  setup mds-test_UUID localhost_UUID
  quit
MTPT: MNT_localhost MNT_localhost_UUID /mnt/lustre mds-test_UUID lov-
test_UUID
+ mkdir /mnt/lustre
+ mount -t lustre_lite -o osc=lov-test,mdc=MDC_head4_mds-
test_MNT_localhost local /mnt/lustre
Traceback (most recent call last):
  File "/usr/sbin/lconf", line 2852, in ?
    main()
  File "/usr/sbin/lconf", line 2845, in main
    doHost(lustreDB, node_list)
  File "/usr/sbin/lconf", line 2288, in doHost
    for_each_profile(node_db, prof_list, doSetup)
  File "/usr/sbin/lconf", line 2068, in for_each_profile
    operation(services)
  File "/usr/sbin/lconf", line 2088, in doSetup
    n.prepare()
  File "/usr/sbin/lconf", line 1899, in prepare
    ret, val = run(cmd)
  File "/usr/sbin/lconf", line 530, in run
    return runcmd(cmd)
  File "/usr/sbin/lconf", line 520, in runcmd
    out = f.readlines()
KeyboardInterrupt

----- /var/log/messages contents ----

Nov 27 11:53:43 head4 kernel: kjournald starting.  Commit interval 5
seconds
Nov 27 11:53:43 head4 kernel: LDISKFS FS on loop2, internal journal
Nov 27 11:53:43 head4 kernel: LDISKFS-fs: mounted filesystem with
ordered data mode.
Nov 27 11:53:43 head4 kernel: Lustre: Changing connection for
OSC_head4_ost1-test_mds-test to localhost_UUID/[EMAIL PROTECTED]
Nov 27 11:53:43 head4 kernel: LustreError: Refusing connection from
127.0.0.1 for [EMAIL PROTECTED]:  No matching NI
Nov 27 11:53:43 head4 kernel: LustreError: 17622:0:
(socklnd_cb.c:1472:ksocknal_recv_hello()) Error -104 reading HELLO from
127.0.0.1
Nov 27 11:53:43 head4 kernel: LustreError: Connection to [EMAIL PROTECTED]
at host 127.0.0.1 on port 988 was reset: is it running a compatible
version of Lustre and is [EMAIL PROTECTED] one of its NIDs?
Nov 27 11:53:43 head4 kernel: LustreError: 17622:0:
(socklnd_cb.c:396:ksocknal_txlist_done()) Deleting packet type 1 len 240
[EMAIL PROTECTED]>[EMAIL PROTECTED]
Nov 27 11:53:43 head4 kernel: LustreError: 17622:0:
(events.c:53:request_out_callback()) @@@ type 4, status -5
[EMAIL PROTECTED] x1/t0 o8->[EMAIL PROTECTED]:6 lens
240/272 ref 2 fl Rpc:/0/0 rc 0/0
Nov 27 11:53:43 head4 kernel: LustreError: 18648:0:
(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at
1164653623, 0s ago) [EMAIL PROTECTED] x1/t0 o8->ost1-
[EMAIL PROTECTED]:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/0

And finally, the RPC timeout messages continue to log into the system
log every 30 seconds or so until I reboot. Can someone see something I'm
missing or has worked through this problem? I wonder if the installation
manual & howto wiki is missing something or assuming something that is
different in my setup. I run nmap localhost and see that port 988/tcp is
open.  Thanks in advance for any help.

------------------------------------------------------------------------

_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Reply via email to