Shouldn't your ip2nets look like this: o2ib0(ib0) 10.50.0.[7-10] ; o2ib1(ib1) 10.50.1.[7-10] ; o2ib0(ib0) 10.50.0.* ; o2ib1(ib0) 10.50.1.*
-----Original Message----- From: Diego Moreno [mailto:[email protected]] Sent: Tuesday, March 22, 2011 8:26 AM To: Andreas Dilger Cc: Mike Hanby; [email protected] Subject: Re: [Lustre-discuss] Lustre over o2ib issue Hi, We are having this problem right now with our Lustre 2.0. We tried the proposed solutions but we didn't get it. We have 2 QDR IB cards on 4 servers and we have to do "lctl ping" from each server to every client if we want clients to connect to servers. We don't have ib_mthca modules loaded because we don't have DDR cards and we configured ip2nets with no result. Our ip2nets configuration ([7-10] interfaces are in servers, the others are in clients): o2ib0(ib0) 10.50.0.[7-10] ; o2ib1(ib1) 10.50.1.[7-10] ; o2ib0(ib0) 10.50.*.* ; o2ib1(ib0) 10.50.*.* So the only way of having clients connected to servers is doing something like this on every server: for i in $CLIENT_IB_LIST ; do lctl ping $i@o2ib0 lctl ping $i@o2ib1 done Before "lctl ping" we get messages like this one: Lustre: 50389:0:(lib-move.c:1028:lnet_post_send_locked()) Dropping message for 12345-10.50.1.7@o2ib1: peer not alive After "lctl ping' everything works right. Maybe I'm missing something or this is a known bug in lustre 2.0... On 16/03/2011 22:13, Andreas Dilger wrote: > On 2011-03-16, at 3:04 PM, Mike Hanby wrote: >> Thanks, I forgot to include the card info: >> >> The servers each have a single IB card: dual port MT26528 QDR >> o2ib0(ib0) on each server is attached to the QLogic switch (with three >> attached M3601Q switches 48 attached blades) >> o2ib1(ib1) on each server is attached to a stack of two M3601Q switches with >> 24 attached blades >> >> The blades connected to o2ib0 each have an MT26428 QDR IB card >> The blades connected to o2ib1 each have an MT25418 DDR IB card > > You may also want to check out the ip2nets option for specifying the Lustre > networks. It is made to handle configuration issues like this where the > interface name is not constant across client/server nodes. > >> >> -----Original Message----- >> From: [email protected] >> [mailto:[email protected]] On Behalf Of Nirmal Seenu >> Sent: Wednesday, March 16, 2011 2:10 PM >> To: [email protected] >> Subject: Re: [Lustre-discuss] Lustre over o2ib issue >> >> If you are using DDR and QDR or any 2 different cards cards in the same >> machine there is no guarantee that the same IB cards get assigned to ib0 and >> ib. >> >> To fix that problem you need to comment out the following 3 lines >> /etc/init.d/openibd: >> >> #for i in `grep "^driver: " /etc/sysconfig/hwconf | sed -e 's/driver: >> //' | grep -w "ib_mthca\\\|ib_ipath\\\|mlx4_core\\\|cxgb3\\\|iw_nes"`; do >> # load_modules $i >> #done >> >> and include the following lines instead(we wanted the DDR card to be ib0 and >> the QDR card to be ib1): >> load_modules ib_mthca >> /bin/sleep 10 >> load_modules mlx4_core >> >> and you will need to restart openibd once again (we included it in rc.local) >> to make sure that the same IB cards are assigned to the devices ib0 and ib1. >> >> Nirmal >> _______________________________________________ >> Lustre-discuss mailing list >> [email protected] >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> _______________________________________________ >> Lustre-discuss mailing list >> [email protected] >> http://lists.lustre.org/mailman/listinfo/lustre-discuss > > > Cheers, Andreas > -- > Andreas Dilger > Principal Engineer > Whamcloud, Inc. > > > > _______________________________________________ > Lustre-discuss mailing list > [email protected] > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
