On Nov 14, 2013, at 1:03 PM, Ralph Castain <r...@open-mpi.org> wrote:

>> 1) What the status of UDCM is (does it work reliably, does it support
>> XRC, etc.)
> 
> Seems to be working okay on the IB systems at LANL and IU. Don't know about 
> XRC - I seem to recall the answer is "no"

FWIW, I recall that when Cisco was testing UDCM (a long time ago -- before we 
threw away our IB gear...), we found bugs in UDCM that only showed up with 
really large numbers of MTT tests running UDCM (i.e., 10K+ tests a night, 
especially with lots of UDCM-based jobs running concurrently on the same 
cluster).  These types of bugs didn't show up in casual testing.

Has that happened with the new/fixed UDCM?  Cisco is no longer in a position to 
test this.

>> 2) What's the difference between CPCs and OFACM and what's our plans
>> w.r.t 1.7 there?
> 
> Pasha created ofacm because some of the collective components now need to 
> forge connections. So he created the common/ofacm code to meet those needs, 
> with the intention of someday replacing the openib cpc's with the new common 
> code. However, this was stalled by the iWarp issue, and so it fell off the 
> table.
> 
> We now have two duplicate ways of doing the same thing, but with code in two 
> different places. :-(

FWIW, the iWARP vendors have repeatedly been warned that ofacm is going to take 
over, and unless they supply patches, iWarp will stop working in Open MPI.  I 
know for a fact that they are very aware of this.

So my $0.02 is that ofacm should take over -- let's get rid of CPC and have 
openib use the ofacm.  The iWarp folks can play catch up if/when they want to.  

Of course, I'm not in this part of the code base any more, so it's not really 
my call -- just my $0.02...

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to