[
https://issues.apache.org/jira/browse/MAPREDUCE-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157227#comment-13157227
]
Ralph Castain commented on MAPREDUCE-2911:
------------------------------------------
Let me preface my comment by confessing my current ignorance of Hadoop. I'm
working on rectifying that situation, but won't claim to be anywhere close to
fully understanding it.
That said, I'm wondering if it is possible to simply run the MPI processes as
standard Hadoop processes? I confess this was my initial thought. Rather than
creating a cluster and using mpirun, just have the user start a standard Hadoop
job - but with the processes being part of an overall MPI application. Thus,
the processes would all call MPI_Init, execute as an MPI application, call
MPI_Finalize, and then exit. If a user wants to integrate that application with
MapReduce, more power to them - I can see some cases where that would be of
interest.
My point here is that you don't need mpirun at all, nor do you need all the
overhead of running OMPI daemons. The Hadoop daemons can start and monitor the
state of health of the MPI processes just fine. We might add some capability to
the Hadoop daemons to assist (e.g., binding), but those would be of use
regardless of whether or not the process is part of an MPI application.
As I said, please forgive the ignorance if my suggestion makes no sense.
> Hamster: Hadoop And Mpi on the same cluSTER
> -------------------------------------------
>
> Key: MAPREDUCE-2911
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2911
> Project: Hadoop Map/Reduce
> Issue Type: New Feature
> Components: mrv2
> Affects Versions: 0.23.0
> Environment: All Unix-Environments
> Reporter: Milind Bhandarkar
> Assignee: Milind Bhandarkar
> Fix For: 0.24.0
>
> Original Estimate: 336h
> Remaining Estimate: 336h
>
> MPI is commonly used for many machine-learning applications. OpenMPI
> (http://www.open-mpi.org/) is a popular BSD-licensed version of MPI. In the
> past, running MPI application on a Hadoop cluster was achieved using Hadoop
> Streaming (http://videolectures.net/nipsworkshops2010_ye_gbd/), but it was
> kludgy. After the resource-manager separation from JobTracker in Hadoop, we
> have all the tools needed to make MPI a first-class citizen on a Hadoop
> cluster. I am currently working on the patch to make MPI an
> application-master. Initial version of this patch will be available soon
> (hopefully before September 10.) This jira will track the development of
> Hamster: The application master for MPI.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira