Hi Diego, Do you have any other module parameter for lnet and lnd?
Regards Liang On Mar 22, 2011, at 9:26 PM, Diego Moreno wrote: > Hi, > > We are having this problem right now with our Lustre 2.0. We tried the > proposed solutions but we didn't get it. > > We have 2 QDR IB cards on 4 servers and we have to do "lctl ping" from > each server to every client if we want clients to connect to servers. We > don't have ib_mthca modules loaded because we don't have DDR cards and > we configured ip2nets with no result. > > Our ip2nets configuration ([7-10] interfaces are in servers, the others > are in clients): > o2ib0(ib0) 10.50.0.[7-10] ; o2ib1(ib1) 10.50.1.[7-10] ; o2ib0(ib0) > 10.50.*.* ; o2ib1(ib0) 10.50.*.* > > So the only way of having clients connected to servers is doing > something like this on every server: > > for i in $CLIENT_IB_LIST ; do > lctl ping $i@o2ib0 > lctl ping $i@o2ib1 > done > > Before "lctl ping" we get messages like this one: > > Lustre: 50389:0:(lib-move.c:1028:lnet_post_send_locked()) Dropping > message for 12345-10.50.1.7@o2ib1: peer not alive > > After "lctl ping' everything works right. > > Maybe I'm missing something or this is a known bug in lustre 2.0... > > > On 16/03/2011 22:13, Andreas Dilger wrote: >> On 2011-03-16, at 3:04 PM, Mike Hanby wrote: >>> Thanks, I forgot to include the card info: >>> >>> The servers each have a single IB card: dual port MT26528 QDR >>> o2ib0(ib0) on each server is attached to the QLogic switch (with three >>> attached M3601Q switches 48 attached blades) >>> o2ib1(ib1) on each server is attached to a stack of two M3601Q switches >>> with 24 attached blades >>> >>> The blades connected to o2ib0 each have an MT26428 QDR IB card >>> The blades connected to o2ib1 each have an MT25418 DDR IB card >> >> You may also want to check out the ip2nets option for specifying the Lustre >> networks. It is made to handle configuration issues like this where the >> interface name is not constant across client/server nodes. >> >>> >>> -----Original Message----- >>> From: [email protected] >>> [mailto:[email protected]] On Behalf Of Nirmal Seenu >>> Sent: Wednesday, March 16, 2011 2:10 PM >>> To: [email protected] >>> Subject: Re: [Lustre-discuss] Lustre over o2ib issue >>> >>> If you are using DDR and QDR or any 2 different cards cards in the same >>> machine there is no guarantee that the same IB cards get assigned to ib0 >>> and ib. >>> >>> To fix that problem you need to comment out the following 3 lines >>> /etc/init.d/openibd: >>> >>> #for i in `grep "^driver: " /etc/sysconfig/hwconf | sed -e 's/driver: >>> //' | grep -w "ib_mthca\\\|ib_ipath\\\|mlx4_core\\\|cxgb3\\\|iw_nes"`; do >>> # load_modules $i >>> #done >>> >>> and include the following lines instead(we wanted the DDR card to be ib0 >>> and the QDR card to be ib1): >>> load_modules ib_mthca >>> /bin/sleep 10 >>> load_modules mlx4_core >>> >>> and you will need to restart openibd once again (we included it in >>> rc.local) to make sure that the same IB cards are assigned to the devices >>> ib0 and ib1. >>> >>> Nirmal >>> _______________________________________________ >>> Lustre-discuss mailing list >>> [email protected] >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>> _______________________________________________ >>> Lustre-discuss mailing list >>> [email protected] >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> >> >> Cheers, Andreas >> -- >> Andreas Dilger >> Principal Engineer >> Whamcloud, Inc. >> >> >> >> _______________________________________________ >> Lustre-discuss mailing list >> [email protected] >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> >> > _______________________________________________ > Lustre-discuss mailing list > [email protected] > http://lists.lustre.org/mailman/listinfo/lustre-discuss _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
