Shouldn't your ip2nets look like this:

o2ib0(ib0) 10.50.0.[7-10] ; o2ib1(ib1) 10.50.1.[7-10] ; o2ib0(ib0) 
10.50.0.* ; o2ib1(ib0) 10.50.1.*

-----Original Message-----
From: Diego Moreno [mailto:[email protected]] 
Sent: Tuesday, March 22, 2011 8:26 AM
To: Andreas Dilger
Cc: Mike Hanby; [email protected]
Subject: Re: [Lustre-discuss] Lustre over o2ib issue

Hi,

We are having this problem right now with our Lustre 2.0. We tried the 
proposed solutions but we didn't get it.

We have 2 QDR IB cards on 4 servers and we have to do "lctl ping" from 
each server to every client if we want clients to connect to servers. We 
don't have ib_mthca modules loaded because we don't have DDR cards and 
we configured ip2nets with no result.

Our ip2nets configuration ([7-10] interfaces are in servers, the others 
are in clients):
o2ib0(ib0) 10.50.0.[7-10] ; o2ib1(ib1) 10.50.1.[7-10] ; o2ib0(ib0) 
10.50.*.* ; o2ib1(ib0) 10.50.*.*

So the only way of having clients connected to servers is doing 
something like this on every server:

for i in $CLIENT_IB_LIST ; do
lctl ping $i@o2ib0
lctl ping $i@o2ib1
done

Before "lctl ping" we get messages like this one:

Lustre: 50389:0:(lib-move.c:1028:lnet_post_send_locked()) Dropping 
message for 12345-10.50.1.7@o2ib1: peer not alive

After "lctl ping' everything works right.

Maybe I'm missing something or this is a known bug in lustre 2.0...


On 16/03/2011 22:13, Andreas Dilger wrote:
> On 2011-03-16, at 3:04 PM, Mike Hanby wrote:
>> Thanks, I forgot to include the card info:
>>
>> The servers each have a single IB card: dual port MT26528 QDR
>> o2ib0(ib0) on each server is attached to the QLogic switch (with three 
>> attached M3601Q switches 48 attached blades)
>> o2ib1(ib1) on each server is attached to a stack of two M3601Q switches with 
>> 24 attached blades
>>
>> The blades connected to o2ib0 each have an MT26428 QDR IB card
>> The blades connected to o2ib1 each have an MT25418 DDR IB card
>
> You may also want to check out the ip2nets option for specifying the Lustre 
> networks.  It is made to handle configuration issues like this where the 
> interface name is not constant across client/server nodes.
>
>>
>> -----Original Message-----
>> From: [email protected] 
>> [mailto:[email protected]] On Behalf Of Nirmal Seenu
>> Sent: Wednesday, March 16, 2011 2:10 PM
>> To: [email protected]
>> Subject: Re: [Lustre-discuss] Lustre over o2ib issue
>>
>> If you are using DDR and QDR or any 2 different cards cards in the same 
>> machine there is no guarantee that the same IB cards get assigned to ib0 and 
>> ib.
>>
>> To fix that problem you need to comment out the following 3 lines 
>> /etc/init.d/openibd:
>>
>>      #for i in `grep "^driver: " /etc/sysconfig/hwconf | sed -e 's/driver: 
>> //' | grep -w "ib_mthca\\\|ib_ipath\\\|mlx4_core\\\|cxgb3\\\|iw_nes"`; do
>>      #    load_modules $i
>>      #done
>>
>> and include the following lines instead(we wanted the DDR card to be ib0 and 
>> the QDR card to be ib1):
>>      load_modules ib_mthca
>>      /bin/sleep 10
>>      load_modules mlx4_core
>>
>> and you will need to restart openibd once again (we included it in rc.local) 
>> to make sure that the same IB cards are assigned to the devices ib0 and ib1.
>>
>> Nirmal
>> _______________________________________________
>> Lustre-discuss mailing list
>> [email protected]
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>> _______________________________________________
>> Lustre-discuss mailing list
>> [email protected]
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>
> Cheers, Andreas
> --
> Andreas Dilger
> Principal Engineer
> Whamcloud, Inc.
>
>
>
> _______________________________________________
> Lustre-discuss mailing list
> [email protected]
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>
_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to