-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
Isaac,
My mistake. I was thinking this issue was similar to an lctl issue that I have
been seeing for
quite a while now, but as you say this is not the case since the node only has
a single NID. I
just oversimplified the problem. The case that I am referring to is what I
think is a bug in the
lctl ping code.
Here is an example below of what I was referring to:
Node1:
ilc6 a lustre server that has two separate ethernet devices eth2 and eth3
# ilc6 /root > cat /etc/modprobe.conf
options lnet networks="tcp0(eth2,eth3)" \
routes="elan0 [EMAIL PROTECTED]"
# ilc6 /root > lctl list_nids
[EMAIL PROTECTED]
Node2:
adev4 is a lustre router that has two separate ethernet devices and and elan
device
# adev4 /root > cat /etc/modprobe.conf
options lnet networks="tcp0(eth0,eth1),elan0" \
forwarding="enabled"
# adev4 /root > lctl list_nids
[EMAIL PROTECTED]
[EMAIL PROTECTED]
Node3:
adev3 is a lustre client with only an elan device
# adev3 /root > lctl list_nids
[EMAIL PROTECTED]
Now the actual problem here is that
1) ilc6 can only successfully issue an lctl ping to the tcp nid even though it
knows
how to get to the elan0 network.
2) adev3 can only successfully issue an lctl ping to the elan nid even though
it knows
how to get to the tcp0 network.
FROM Node1:
# ilc6 /root > lctl ping [EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
# ilc6 /root > lctl ping [EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
ERROR:
# ilc6 /root > lctl ping [EMAIL PROTECTED]
failed to ping [EMAIL PROTECTED]: Input/output error
FROM Node3:
# adev3 /root > lctl ping [EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
# adev3 /root > lctl ping [EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
ERROR:
# adev3 /root > lctl ping [EMAIL PROTECTED]
failed to ping [EMAIL PROTECTED]: Input/output error
This is the error I was mistakenly trying to describe yesterday.
Isaac Huang wrote:
> On Wed, Jan 16, 2008 at 11:23:30AM -0800, Herb Wartens wrote:
> Andrew,
> I have not used lustre-1.6.4.X yet, but in previous versions (and most likely
> the version you are using)
> Lustre actually listens on all interfaces no matter what you specify in the
> modprobe.conf. You can verify this
> by looking at the netstat output for port 988 and look for what ports you are
> listening on. We here at LLNL
> regularly use multiple interfaces.
> I believe that the issue you are referring to is a bug in the lctl ping code
> where the ping only responds
> over the first network device specified for a particular lnd. As long as you
> have properly configured your
> host routes so that you can ping both interfaces from the other node you
> should be fine. IMHO this should
> just be fixed in lnet so you can do an lctl ping from any endpoint to any
> other endpoint.
>
>> I don't think it's a lctl ping bug.
>
> # ilc6 /root > cat /etc/modprobe.conf
> options lnet networks="tcp0(eth2,eth3)"
>
>> This config gives the node only one NID: [EMAIL PROTECTED] You can
>> verify it by 'lctl list_nids' on the node.
>
> # ilc6 /root > netstat -a -t -n | grep 988 | grep LIST
> tcp 0 0 0.0.0.0:988 0.0.0.0:*
> LISTEN
>
> # ilc6 /root > cat /etc/hosts | grep ilc7
> 172.16.101.7 ilc7-lnet0 ilc7-eth2
> 172.16.102.7 ilc7-lnet1 ilc7-eth3
>
> # ilc6 /root > lctl ping [EMAIL PROTECTED]
> [EMAIL PROTECTED]
> [EMAIL PROTECTED]
>
>> When you lctl ping a node at any one of its NIDs, the ping reply
>> contains a list of all NIDs of the node. As can be seen from the reply
>> above, [EMAIL PROTECTED] has two NIDs: [EMAIL PROTECTED] and [EMAIL
>> PROTECTED]
>
>> So when you tried 'lctl ping [EMAIL PROTECTED]', the ping request
>> could reach 172.16.102.7, but it was rejected since [EMAIL PROTECTED]
>> was not one of the node's NIDs.
>
>> The socklnd does interface bonding transparently from lnet's
>> perspective. It exchanges a list of IPs of all NICs under a lnet NID
>> with peers, and creates connections to all IPs of a peer and thus
>> aggregates bandwidth. Lnet has no knowledge of this - all it sees is
>> just one NID, i.e. [EMAIL PROTECTED]
>
>> Isaac
>
> # ilc6 /root > lctl ping [EMAIL PROTECTED]
> failed to ping [EMAIL PROTECTED]: Input/output error
>
> # ilc6 /root > ping -c 1 172.16.101.7
> PING 172.16.101.7 (172.16.101.7) 56(84) bytes of data.
> 64 bytes from 172.16.101.7: icmp_seq=1 ttl=64 time=0.143 ms
>
> --- 172.16.101.7 ping statistics ---
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> rtt min/avg/max/mdev = 0.143/0.143/0.143/0.000 ms
> # ilc6 /root > ping -c 1 172.16.102.7
> PING 172.16.102.7 (172.16.102.7) 56(84) bytes of data.
> 64 bytes from 172.16.102.7: icmp_seq=1 ttl=64 time=0.094 ms
>
> --- 172.16.102.7 ping statistics ---
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> rtt min/avg/max/mdev = 0.094/0.094/0.094/0.000 ms
>
>
> Lundgren, Andrew wrote:
>>>> So the only way to use two nics at once is to bond? I am more for
>>>> redundancy rather than increased throughput.
>>>>
>>>>> -----Original Message-----
>>>>> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
>>>>> Sent: Wednesday, January 16, 2008 6:34 AM
>>>>> To: Lundgren, Andrew
>>>>> Cc: '[email protected]'
>>>>> Subject: Re: [Lustre-discuss] How do you make an MGS/OSS
>>>>> listen on 2 NICs?
>>>>>
>>>>> On Tue, Jan 15, 2008 at 10:28:33AM -0700, Lundgren, Andrew wrote:
>>>>>> I am running on CentOS 5 distribution without adding any
>>>>> updates from
>>>>>> CentOS. I am using the lustre 1.6.4.1 kernel and software.
>>>>>>
>>>>>>
>>>>>>
>>>>>> I have two NICs that run though different switches.
>>>>>>
>>>>>>
>>>>>>
>>>>>> I have the lustre options in my modprobe.conf to look like this:
>>>>>>
>>>>>>
>>>>>>
>>>>>> options lnet networks=tcp0(eth1,eth0)
>>>>>>
>>>>> This way of interface bonding is now a deprecated lnet
>>>>> feature. Please refer to:
>>>>> http://manual.lustre.org/manual/LustreManual16_HTML/DynamicHTM
>>>>> L-13-1.html
>>>>>
>>>>> Isaac
>>>>>
>>>> _______________________________________________
>>>> Lustre-discuss mailing list
>>>> [email protected]
>>>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>>
_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org
iD8DBQFHj8/SP/62XqEEbMYRCtrrAKC4q4EWSdmjKmLaR9itrEoa4gdd0gCgn32S
OI4G8yg8Czvy1lsLNYHqBcY=
=R7ZB
-----END PGP SIGNATURE-----
_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss