As part of Hamster, I have a prototype of MPI on Yarn running locally that I plan to contribute as a patch sometime soon. I hope it gets in as part of official hadoop 0.23. The open MPI part of it will be published as a patch on github (while I sort out the legal requirements) that will have to be downloaded and patched to open MPI trunk.
- Milind --- Milind Bhandarkar Greenplum Labs, EMC (Disclaimer: Opinions expressed in this email are those of the author, and do not necessarily represent the views of any organization, past or present, the author might be affiliated with.) On 9/14/11 3:22 AM, "Robert Evans" <[email protected]> wrote: >Another option to think about is that there is a Hamster project ( >MAPREDUCE-2911 <https://issues.apache.org/jira/browse/MAPREDUCE-2911> ) >that will allow OpenMPI to run on a Hadoop Cluster. It is still very >preliminary and will probably not be ready until Hadoop 0.23 or 0.24. > >There are other processing methodologies being developed to run on top of >YARN (Which is the resource scheduler put in as part of Hadoop 0.23) >http://wiki.apache.org/hadoop/PoweredByYarn > >So there are even more choices coming depending on your problem. > >--Bobby Evans > >On 9/13/11 12:54 PM, "Parker Jones" <[email protected]> wrote: > > > >Thank you for the explanations, Bobby. That helps significantly. > >I also read the article below which gave me a better understanding of the >relative merits of MapReduce/Hadoop vs MPI. Alberto, you might find it >useful too. >http://grids.ucs.indiana.edu/ptliupages/publications/CloudsandMR.pdf > >There is even a MapReduce API built on top of MPI developed at Sandia. > >So many options to choose from :-) > >Cheers, >Parker > >> From: [email protected] >> To: [email protected] >> Date: Mon, 12 Sep 2011 14:02:44 -0700 >> Subject: Re: Is Hadoop the right platform for my HPC application? >> >> Parker, >> >> The hadoop command itself is just a shell script that sets up your >>classpath and some environment variables for a JVM. Hadoop provides a >>java API and you should be able to use to write you application, without >>dealing with the command line. That being said there is no Map/Reduce >>C/C++ API. There is libhdfs.so that will allow you to read/write HDFS >>files from a C/C++ program, but it actually launches a JVM behind the >>scenes to handle the actual requests. >> >> As for a way to avoid writing your input data into files, the data has >>to be distributed to the compute nodes some how. You could write a >>custom input format that does not use any input files, and then have it >>load the data a different way. I believe that some people do this to >>load data from MySQL or some other DB for processing. Similarly you >>could do something with the output format to put the data someplace else. >> >> It is hard to say if Hadoop is the right platform without more >>information about what you are doing. Hadoop has been used for lots of >>embarrassingly parallel problems. The processing is easy, the real >>question is where is your data coming from, and where are the results >>going. Map/Reduce is fast in part because it tries to reduce data >>movement and move the computation to the data, not the other way round. >>Without knowing the expected size of your data or the amount of >>processing that it will do, it is hard to say. >> >> --Bobby Evans >> >> On 9/12/11 5:09 AM, "Parker Jones" <[email protected]> wrote: >> >> >> >> Hello all, >> >> I have Hadoop up and running and an embarrassingly parallel problem but >>can't figure out how to arrange the problem. My apologies in advance if >>this is obvious and I'm not getting it. >> >> My HPC application isn't a batch program, but runs in a continuous loop >>(like a server) *outside* of the Hadoop machines, and it should >>occasionally farm out a large computation to Hadoop and use the results. >> However, all the examples I have come across interact with Hadoop via >>files and the command line. (Perhaps I am looking at the wrong places?) >> >> So, >> * is Hadoop the right platform for this kind of problem? >> * is it possible to use Hadoop without going through the command line >>and writing all input data to files? >> >> If so, could someone point me to some examples and documentation. I am >>coding in C/C++ in case that is relevant, but examples in any language >>should be helpful. >> >> Thanks for any suggestions, >> Parker >> >> >> > >
