Hi Milind Glad to hear of the progress - I recall our earlier conversation. I gather you have completed step 1 (wireup) - have you given any thought to the other two steps? Anything I can do to help?
Ralph On Nov 21, 2011, at 4:47 PM, <milind.bhandar...@emc.com> wrote: > Hi Ralph, > > I spoke with Jeff Squyres at SC11, and updated him on the status of my > OpenMPI port on Hadoop Yarn. > > To update everyone, I have OpenMPI examples running on #Yarn, although it > requires some code cleanup and refactoring, however that can be done as a > later step. > > Currently, the MPI processes come up, get submitting client's IP and port > via environment variables, connect to it, and do a barrier. The result of > this barrier is that everyone in MPI_COMM_WORLD gets each other's > endpoints. > > I am aiming to submit the patch to hadoop by the end of this month. > > I will publish the openmpi patch to github. > > (As I mentioned to Jeff, OpenMPI requires a CCLA for accepting > submissions. That will take some time.) > > - Milind > > --- > Milind Bhandarkar > Greenplum Labs, EMC > (Disclaimer: Opinions expressed in this email are those of the author, and > do not necessarily represent the views of any organization, past or > present, the author might be affiliated with.) > > > >> >> I'm willing to do the integration work, but wanted to check first to see >> if (a) someone in the Hadoop community is already doing so, and (b) if >> you would be interested in seeing such a capability and willing to accept >> the code contribution? >> >> Establishing MPI support requires the following steps: >> >> 1. wireup support. MPI processes need to exchange endpoint info (e.g., >> for TCP connections, IP address and port) so that each process knows how >> to connect to any other process in the application. This is typically >> done in a collective "modex" operation. There are several ways of doing >> it - if we proceed, I will outline those in a separate email to solicit >> your input on the most desirable approach to use. >> >> 2. binding support. One can achieve significant performance improvements >> by binding processes to specific cores, sockets, and/or NUMA regions >> (regardless of using MPI or not, but certainly important for MPI >> applications). This requires not only the binding code, but some logic to >> ensure that one doesn't "overload" specific resources. >> >> 3. process mapping. I haven't verified it yet, but I suspect that Hadoop >> provides each executing instance with an identifier that is unique within >> that job - e.g., we typically assign an integer "rank" that ranges from 0 >> to the number of instances being executed. This identifier is critical >> for MPI applications, and the relative placement of processes within a >> job often dictates overall performance. Thus, we would provide a mapping >> capability that allows users to specify patterns of process placement for >> their job - e.g., "place one process on each socket on every node". >> >> I have written the code to implement the above support on a number of >> systems, and don't foresee major problems doing it for Hadoop (though I >> would welcome a chance to get a brief walk-thru the code from someone). >> Please let me know if this would be of interest to the Hadoop community. >> >> Thanks >> Ralph Castain >> >> >> >