[jira] [Commented] (MAPREDUCE-2911) Hamster: Hadoop And Mpi on the same cluSTER

Ralph Castain (Commented) (JIRA) Fri, 25 Nov 2011 08:34:03 -0800

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157227#comment-13157227
 ]


Ralph Castain commented on MAPREDUCE-2911:
------------------------------------------

Let me preface my comment by confessing my current ignorance of Hadoop. I'm 
working on rectifying that situation, but won't claim to be anywhere close to 
fully understanding it.

That said, I'm wondering if it is possible to simply run the MPI processes as 
standard Hadoop processes? I confess this was my initial thought. Rather than 
creating a cluster and using mpirun, just have the user start a standard Hadoop 
job - but with the processes being part of an overall MPI application. Thus, 
the processes would all call MPI_Init, execute as an MPI application, call 
MPI_Finalize, and then exit. If a user wants to integrate that application with 
MapReduce, more power to them - I can see some cases where that would be of 
interest.

My point here is that you don't need mpirun at all, nor do you need all the 
overhead of running OMPI daemons. The Hadoop daemons can start and monitor the 
state of health of the MPI processes just fine. We might add some capability to 
the Hadoop daemons to assist (e.g., binding), but those would be of use 
regardless of whether or not the process is part of an MPI application.

As I said, please forgive the ignorance if my suggestion makes no sense.

                
> Hamster: Hadoop And Mpi on the same cluSTER
> -------------------------------------------
>
>                 Key: MAPREDUCE-2911
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2911
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv2
>    Affects Versions: 0.23.0
>         Environment: All Unix-Environments
>            Reporter: Milind Bhandarkar
>            Assignee: Milind Bhandarkar
>             Fix For: 0.24.0
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> MPI is commonly used for many machine-learning applications. OpenMPI 
> (http://www.open-mpi.org/) is a popular BSD-licensed version of MPI. In the 
> past, running MPI application on a Hadoop cluster was achieved using Hadoop 
> Streaming (http://videolectures.net/nipsworkshops2010_ye_gbd/), but it was 
> kludgy. After the resource-manager separation from JobTracker in Hadoop, we 
> have all the tools needed to make MPI a first-class citizen on a Hadoop 
> cluster. I am currently working on the patch to make MPI an 
> application-master. Initial version of this patch will be available soon 
> (hopefully before September 10.) This jira will track the development of 
> Hamster: The application master for MPI.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2911) Hamster: Hadoop And Mpi on the same cluSTER

Reply via email to