I'm a bit outdated. What it the problem with oob / xoob ? -Pasha On Nov 14, 2013, at 3:07 PM, "Hjelm, Nathan T" <hje...@lanl.gov> wrote:
> I don't think so. From what I understand the iboffload component may not live > much longer because of > Mellanox's fork of Cheetah. So, it might not matter. > > -Nathan > > Excuse the *&(#$y Outlook posting-style. OWA sucks. > ________________________________________ > From: devel [devel-boun...@open-mpi.org] on behalf of Ralph Castain > [r...@open-mpi.org] > Sent: Thursday, November 14, 2013 12:58 PM > To: Open MPI Developers > Subject: Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi > r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib > ompi/mca/btl/openib/connect > > The key question, though, is: has anyone checked to see if the ofacm code > even works any more?? > > Only oob and xoob components appear to be present - so unless someone fixed > those since they were originally copied from openib, I doubt ofacm works. > > > On Nov 14, 2013, at 11:08 AM, Shamis, Pavel <sham...@ornl.gov> wrote: > >> There is some confusion in the thread. UDCM is just another CPC, like XOOB, >> OOB, and RDMACM (I think IBCM is officially dead). >> XOOB and OOB don't use UDCM, they relay on ORTE out-of-band communication. >> >> OpenIB/connect supports UDCM,XOOB,OOB, and RDMACM >> OFACM supports (at least last time when we checked) OOB and XOOB >> >> RDMACM was not moved to OFACM, because of iWarp's "first message" >> requirement that used to break the abstraction. >> Moreover RDMACM scalability used to be terrible, as a result no one in IB >> community really used it. >> The situation is a bit different today, since ROCEE relays on RDMACM. It >> worth noting that you may setup >> ROCEE connections with a regular OOB with a some restrictions (we did it for >> mvapich-1). >> >> The code between ofacm and openib is similar, but NOT the same. We change >> the API in a way that it allows >> to hide XRC QP management (there is hash table that manages QP to EP >> mapping) in OFACM instead of OPENIB. >> This made openib initialization code a bit cleaner. Here is my old tree with >> openib btl changes https://bitbucket.org/pasha/ofacm >> >> I hope it helps, >> >> Best, >> Pasha >> >> On Nov 14, 2013, at 1:17 PM, Joshua Ladd <josh...@mellanox.com> wrote: >> >>> Unless someone went in and "fixed" the code in common (judging by the >>> comments, fixed seems to imply porting (x)oob to use UDCM, which hasn't >>> been done at all in the context of xoob and is incompletely patched and >>> remains unusable as a replacement for oob in 1.7.4), there is no reason to >>> believe it would work any different than the cpcs under btl/openib/connect. >>> IIRC, it's the same code - copy/pasted - just moved to a common location so >>> Cheetah collectives can do their wireup. So, if oob cpc doesn't work, ofacm >>> oob won't work either and, I guess, by extension, Cheetah IBoffload won't >>> work. Pasha, correct me if you know different. >>> >>> >>> Josh >>> >>> >>> -----Original Message----- >>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain >>> Sent: Thursday, November 14, 2013 1:05 PM >>> To: Open MPI Developers >>> Subject: Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi >>> r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib >>> ompi/mca/btl/openib/connect >>> >>> >>> On Nov 14, 2013, at 9:33 AM, Barrett, Brian W <bwba...@sandia.gov> wrote: >>> >>>> On 11/14/13 9:51 AM, "Jeff Squyres (jsquyres)" <jsquy...@cisco.com> wrote: >>>> >>>>> Does XRC work with the UDCM CPC? >>>>> >>>>> >>>>> On Nov 14, 2013, at 9:35 AM, Ralph Castain <r...@open-mpi.org> wrote: >>>>> >>>>>> I think the problems in udcm were fixed by Nathan quite some time >>>>>> ago, but never moved to 1.7 as everyone was told that the connect >>>>>> code in openib was already deprecated pending merge with the new >>>>>> ofacm common code. Looking over at that area, I see only oob and >>>>>> xoob - so if the users of the common ofacm code are finding that it >>>>>> works, the simple answer may just be to finally complete the switchover. >>>>>> >>>>>> Meantime, perhaps someone can CMR and review a copying of the udcm >>>>>> cpc to the 1.7 branch? >>>>>> >>>>>> >>>>>> On Nov 14, 2013, at 5:14 AM, Joshua Ladd <josh...@mellanox.com> wrote: >>>>>> >>>>>>> Um, no. It's supposed to work with UDCM which doesn't appear to be >>>>>>> enabled in 1.7. >>>>>>> >>>>>>> Per Ralph's comment to me last night: >>>>>>> >>>>>>> "... you cannot use the oob connection manager. It doesn't work and >>>>>>> was deprecated. You must use udcm, which is why things are supposed >>>>>>> to be set to do so by default. Please check the openib connect >>>>>>> priorities and correct them if necessary." >>>>>>> >>>>>>> However, it's never been enabled in 1.7 - don't know what "borked" >>>>>>> means, and from what Devendar tells me, several UDCM commits that >>>>>>> are in the trunk have not been pushed over to 1.7: >>>>>>> >>>>>>> So, as of this moment, OpenIB BTL is essentially dead-in-the-water >>>>>>> in 1.7. >>>>>>> >>>>>>> >>>>>>> >>>> >>>> I'm going to start by admitting that I haven't been paying attention >>>> to IB the last couple of months, so I'm out of my league a little bit >>>> here. I remember discussions of UDCM replacing OOB both because the >>>> OOB CPC had some issues and because it would make it easier to move >>>> the BTLs to the OPAL layer (ie, below the OOB). But I also thought >>>> that was more future work than it clearly was. So can someone let me know: >>>> >>>> 1) What the status of UDCM is (does it work reliably, does it support >>>> XRC, etc.) >>> >>> Seems to be working okay on the IB systems at LANL and IU. Don't know about >>> XRC - I seem to recall the answer is "no" >>> >>>> 2) What's the difference between CPCs and OFACM and what's our plans >>>> w.r.t 1.7 there? >>> >>> Pasha created ofacm because some of the collective components now need to >>> forge connections. So he created the common/ofacm code to meet those needs, >>> with the intention of someday replacing the openib cpc's with the new >>> common code. However, this was stalled by the iWarp issue, and so it fell >>> off the table. >>> >>> We now have two duplicate ways of doing the same thing, but with code in >>> two different places. :-( >>> >>>> 3) Someone mentioned that ofacm oob worked, but cpc oob didn't. Can >>>> someone explain why? >>> >>> I'm not sure that is actually true as there is no indication that anyone is >>> using or testing the collective components that use ofacm code. >>> >>> >>>> >>>> Again, sorry for being dense; I've been spending too much time in >>>> Portals land lately. >>>> >>>> Brian >>>> >>>> -- >>>> Brian W. Barrett >>>> Scalable System Software Group >>>> Sandia National Laboratories >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel