I'll state right up front that I'm in favor of removing MPI from OFED. :-)

On Jun 2, 2009, at 3:30 PM, Tziporet Koren wrote:

Here are some of the reasons for keeping MPI in or out of OFED.

Main reasons to take MPI out of OFED:
- MPI is not developed under OFA
- Need to synchronize between different projects

--> This point alone is hugely difficult. OFED has been held up multiple times because of MPI release delays. I would like to stress that the open source MPI implementations are entirely different software projects with minimal overlap of personnel between MPI and OFED. Also, OFED is but one of the stacks that the MPI's support.

For example, many Open MPI members don't care about OpenFabrics at all. It has sometimes been difficult to rationalize MPI schedule adjustments because of OFED.

- Some customers prefer to install a different MPI version then the one in OFED

- Also note that OFED only includes the open source MPI's; it's not a level playing field.

- Several of Open MPI's features are disabled in OFED builds (e.g., scheduler / resource manager support), causing customers to re- download / re-build Open MPI anyway. *** This is a fairly important point to us.


Main reasons to keep MPI in OFED:
- All participants test with the same MPI versions, and when installing OFED it is ensured that MPI will work fine with this version.

--> I think we're all convinced that testing OFED with the various MPI implementations is a Good Thing -- nobody is suggesting that we remove that. As discussed on the call, we can certainly all test the same version of the open source MPI's during OFED releases and include this information in the release notes ("OFED x.y.z was tested with Open MPI a.b.c.").

The Open MPI project would be happy to contribute the MPI Testing Tool (MTT; which itself is also open source) to OFED. The MTT was designed for almost exactly this purpose: a large, distributed set of organizations that all need to test the same versions of MPI and report the results to a central location. The MTT is not specific to Open MPI; it can be used with any MPI implementation. The MTT is essentially an engine to download/install MPI implementations, run a set of MPI tests (e.g., IMB, OSU and others), and then report the results to a central database.

FWIW: several OFA members are using MTT internally for their own MPI testing.

- Customers convenience in install (no need to go to more sites to get MPI)

--> My $0.02 is that this is a dubious point. In installing a production HPC cluster, you're getting 20 pieces of software and installing them together. So if you have to get OFED *and* MPI (i.e., 21 pieces of software), I think it makes little difference. Indeed, as noted above, MPI implementations move at a different speed than OFED, so many customers go install their own MPI's anyway (or, per an above point, need to re-install MPI's to enable features that the OFED packaging disables).

I think the biggest issue related to this point is going to be customers asking "you *used* to bundle MPI, why don't do anymore?"

- MPI is an important RDMA ULP and although it is not developed in OFA it is widely used by OFED customers

--> This point is actually used as a crutch by the OFA: why develop your own comprehensive performance testing / monitoring tools when you have MPI with all of its tools? Removing MPI may help motivate the OFA to have better networking tools than those provided by a ULP. (before you flame, I admit that this is a minor point, but it is still worth noting -- I've always thought it strange that people use MPI to monitor and test their OF-based networks...)

--
Jeff Squyres
Cisco Systems

_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to