I'm a bit outdated. What it the problem with oob / xoob ?
-Pasha

On Nov 14, 2013, at 3:07 PM, "Hjelm, Nathan T" <hje...@lanl.gov> wrote:

> I don't think so. From what I understand the iboffload component may not live 
> much longer because of
> Mellanox's fork of Cheetah. So, it might not matter.
> 
> -Nathan
> 
> Excuse the *&(#$y Outlook posting-style. OWA sucks.
> ________________________________________
> From: devel [devel-boun...@open-mpi.org] on behalf of Ralph Castain 
> [r...@open-mpi.org]
> Sent: Thursday, November 14, 2013 12:58 PM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full]        svn:open-mpi  
>   r29703  - in trunk:     contrib/platform/iu/odin        ompi/mca/btl/openib 
>     ompi/mca/btl/openib/connect
> 
> The key question, though, is: has anyone checked to see if the ofacm code 
> even works any more??
> 
> Only oob and xoob components appear to be present - so unless someone fixed 
> those since they were originally copied from openib, I doubt ofacm works.
> 
> 
> On Nov 14, 2013, at 11:08 AM, Shamis, Pavel <sham...@ornl.gov> wrote:
> 
>> There is some confusion in the thread. UDCM is just another CPC, like XOOB, 
>> OOB, and RDMACM (I think IBCM is officially dead).
>> XOOB and OOB don't use UDCM, they relay on ORTE out-of-band communication.
>> 
>> OpenIB/connect supports UDCM,XOOB,OOB, and RDMACM
>> OFACM supports (at least last time when we checked) OOB and XOOB
>> 
>> RDMACM was not moved to OFACM, because of iWarp's "first message" 
>> requirement that used to break the abstraction.
>> Moreover RDMACM scalability used to be terrible, as a result no one in IB 
>> community really used it.
>> The situation is a bit different today, since ROCEE relays on RDMACM. It 
>> worth noting that you may setup
>> ROCEE connections with a regular OOB with a some restrictions (we did it for 
>> mvapich-1).
>> 
>> The code between ofacm and openib is similar, but NOT the same. We change 
>> the API in a way that it allows
>> to hide XRC QP management (there is hash table that manages QP to EP 
>> mapping) in OFACM instead of OPENIB.
>> This made openib initialization code a bit cleaner. Here is my old tree with 
>> openib btl changes https://bitbucket.org/pasha/ofacm
>> 
>> I hope it helps,
>> 
>> Best,
>> Pasha
>> 
>> On Nov 14, 2013, at 1:17 PM, Joshua Ladd <josh...@mellanox.com> wrote:
>> 
>>> Unless someone went in and "fixed" the code in common (judging by the 
>>> comments, fixed seems to imply porting (x)oob to use UDCM, which hasn't 
>>> been done at all in the context of xoob and is incompletely patched and 
>>> remains unusable as a replacement for oob in 1.7.4), there is no reason to 
>>> believe it would work any different than the cpcs under btl/openib/connect. 
>>> IIRC, it's the same code - copy/pasted - just moved to a common location so 
>>> Cheetah collectives can do their wireup. So, if oob cpc doesn't work, ofacm 
>>> oob won't work either and, I guess, by extension, Cheetah IBoffload won't 
>>> work. Pasha, correct me if you know different.
>>> 
>>> 
>>> Josh
>>> 
>>> 
>>> -----Original Message-----
>>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain
>>> Sent: Thursday, November 14, 2013 1:05 PM
>>> To: Open MPI Developers
>>> Subject: Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi 
>>> r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib 
>>> ompi/mca/btl/openib/connect
>>> 
>>> 
>>> On Nov 14, 2013, at 9:33 AM, Barrett, Brian W <bwba...@sandia.gov> wrote:
>>> 
>>>> On 11/14/13 9:51 AM, "Jeff Squyres (jsquyres)" <jsquy...@cisco.com> wrote:
>>>> 
>>>>> Does XRC work with the UDCM CPC?
>>>>> 
>>>>> 
>>>>> On Nov 14, 2013, at 9:35 AM, Ralph Castain <r...@open-mpi.org> wrote:
>>>>> 
>>>>>> I think the problems in udcm were fixed by Nathan quite some time
>>>>>> ago, but never moved to 1.7 as everyone was told that the connect
>>>>>> code in openib was already deprecated pending merge with the new
>>>>>> ofacm common code. Looking over at that area, I see only oob and
>>>>>> xoob - so if the users of the common ofacm code are finding that it
>>>>>> works, the simple answer may just be to finally complete the switchover.
>>>>>> 
>>>>>> Meantime, perhaps someone can CMR and review a copying of the udcm
>>>>>> cpc to the 1.7 branch?
>>>>>> 
>>>>>> 
>>>>>> On Nov 14, 2013, at 5:14 AM, Joshua Ladd <josh...@mellanox.com> wrote:
>>>>>> 
>>>>>>> Um, no. It's supposed to work with UDCM which doesn't appear to be
>>>>>>> enabled in 1.7.
>>>>>>> 
>>>>>>> Per Ralph's comment to me last night:
>>>>>>> 
>>>>>>> "... you cannot use the oob connection manager. It doesn't work and
>>>>>>> was deprecated. You must use udcm, which is why things are supposed
>>>>>>> to be set to do so by default. Please check the openib connect
>>>>>>> priorities and correct them if necessary."
>>>>>>> 
>>>>>>> However, it's never been enabled in 1.7 - don't know what "borked"
>>>>>>> means, and from what Devendar tells me, several UDCM commits that
>>>>>>> are in the trunk have not been pushed over to 1.7:
>>>>>>> 
>>>>>>> So, as of this moment, OpenIB BTL is essentially dead-in-the-water
>>>>>>> in 1.7.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>> 
>>>> I'm going to start by admitting that I haven't been paying attention
>>>> to IB the last couple of months, so I'm out of my league a little bit
>>>> here.  I remember discussions of UDCM replacing OOB both because the
>>>> OOB CPC had some issues and because it would make it easier to move
>>>> the BTLs to the OPAL layer (ie, below the OOB).  But I also thought
>>>> that was more future work than it clearly was.  So can someone let me know:
>>>> 
>>>> 1) What the status of UDCM is (does it work reliably, does it support
>>>> XRC, etc.)
>>> 
>>> Seems to be working okay on the IB systems at LANL and IU. Don't know about 
>>> XRC - I seem to recall the answer is "no"
>>> 
>>>> 2) What's the difference between CPCs and OFACM and what's our plans
>>>> w.r.t 1.7 there?
>>> 
>>> Pasha created ofacm because some of the collective components now need to 
>>> forge connections. So he created the common/ofacm code to meet those needs, 
>>> with the intention of someday replacing the openib cpc's with the new 
>>> common code. However, this was stalled by the iWarp issue, and so it fell 
>>> off the table.
>>> 
>>> We now have two duplicate ways of doing the same thing, but with code in 
>>> two different places. :-(
>>> 
>>>> 3) Someone mentioned that ofacm oob worked, but cpc oob didn't.  Can
>>>> someone explain why?
>>> 
>>> I'm not sure that is actually true as there is no indication that anyone is 
>>> using or testing the collective components that use ofacm code.
>>> 
>>> 
>>>> 
>>>> Again, sorry for being dense; I've been spending too much time in
>>>> Portals land lately.
>>>> 
>>>> Brian
>>>> 
>>>> --
>>>> Brian W. Barrett
>>>> Scalable System Software Group
>>>> Sandia National Laboratories
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to