This seems to be a common problem when working with multihomed systems. A short description of the situation: - Lustre 1.4.7.3 (2.6.9-42.0.2.EL_lustre.1.4.7.3smp) - 5 machines named helios160 trough helios164 each with two NICs (networks 10.10.3. and 172.16.0.) - 4 OSTs: helios160-163 1 MDS: helios164 5 clients: helios160-164 - options lnet networks="tcp0" in /etc/modprobe.conf
The OSTs start using lconf but it "binds" to the wrong interface. It should bind to the 172.16.0. interface in stead of the 10.10.3. interface. I have been playing around with the lnet options but lnet does not bind on the other NIC. So the question is, how to configure lnet to use the other interface? [EMAIL PROTECTED] ~]# cat /proc/sys/lnet/nis nid refs peer max tx min [EMAIL PROTECTED] 2 0 0 0 0 [EMAIL PROTECTED] 1 8 256 256 256 Trying to add the correct interface seems to have no result: [EMAIL PROTECTED] ~]# lctl lctl > network tcp lctl > interface_list 10.10.3.160: (10.10.3.160/255.255.128.0) npeer 0 nroute 0 lctl > add_interface 172.16.0.160 lctl > interface_list 10.10.3.160: (10.10.3.160/255.255.128.0) npeer 0 nroute 0 helios160: (172.16.0.160/255.255.255.0) npeer 0 nroute 0 lctl > quit [EMAIL PROTECTED] ~]# cat /proc/sys/lnet/nis nid refs peer max tx min [EMAIL PROTECTED] 2 0 0 0 0 [EMAIL PROTECTED] 1 8 256 256 256 Because of that, I get the following error when starting the MDS on 172.16.0.164: LustreError: Refusing connection from 172.16.0 .164 for [EMAIL PROTECTED]: No matching NI PS1: no errors a la lnet: Unknown parameter PS2: /etc/hosts identical on all systems involved 127.0.0.1 localhost.localdomain localhost 172.16.0.160 helios160 172.16.0.161 helios161 172.16.0.162 helios162 172.16.0.163 helios163 172.16.0.164 helios164 172.16.0.165 helios165 172.16.0.166 helios166 _______________________________________________ Lustre-discuss mailing list [email protected] https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
