Re: Is Hadoop the right platform for my HPC application?

Milind.Bhandarkar Tue, 11 Oct 2011 13:10:37 -0700

As part of Hamster, I have a prototype of MPI on Yarn running locally that
I plan to contribute as a patch sometime soon. I hope it gets in as part
of official hadoop 0.23. The open MPI part of it will be published as a
patch on github (while I sort out the legal requirements) that will have
to be downloaded and patched to open MPI trunk.


- Milind

---
Milind Bhandarkar
Greenplum Labs, EMC
(Disclaimer: Opinions expressed in this email are those of the author, and
do not necessarily represent the views of any organization, past or
present, the author might be affiliated with.)



On 9/14/11 3:22 AM, "Robert Evans" <[email protected]> wrote:

>Another option to think about is that there is a Hamster project (
>MAPREDUCE-2911 <https://issues.apache.org/jira/browse/MAPREDUCE-2911> )
>that will allow OpenMPI to run on a Hadoop Cluster.  It is still very
>preliminary and will probably not be ready until Hadoop 0.23 or 0.24.
>
>There are other processing methodologies being developed to run on top of
>YARN (Which is the resource scheduler put in as part of Hadoop 0.23)
>http://wiki.apache.org/hadoop/PoweredByYarn
>
>So there are even more choices coming depending on your problem.
>
>--Bobby Evans
>
>On 9/13/11 12:54 PM, "Parker Jones" <[email protected]> wrote:
>
>
>
>Thank you for the explanations, Bobby.  That helps significantly.
>
>I also read the article below which gave me a better understanding of the
>relative merits of MapReduce/Hadoop vs MPI.  Alberto, you might find it
>useful too.
>http://grids.ucs.indiana.edu/ptliupages/publications/CloudsandMR.pdf
>
>There is even a MapReduce API built on top of MPI developed at Sandia.
>
>So many options to choose from :-)
>
>Cheers,
>Parker
>
>> From: [email protected]
>> To: [email protected]
>> Date: Mon, 12 Sep 2011 14:02:44 -0700
>> Subject: Re: Is Hadoop the right platform for my HPC application?
>>
>> Parker,
>>
>> The hadoop command itself is just a shell script that sets up your
>>classpath and some environment variables for a JVM.  Hadoop provides a
>>java API and you should be able to use to write you application, without
>>dealing with the command line.  That being said there is no Map/Reduce
>>C/C++ API.  There is libhdfs.so that will allow you to read/write HDFS
>>files from a C/C++ program, but it actually launches a JVM behind the
>>scenes to handle the actual requests.
>>
>> As for a way to avoid writing your input data into files, the data has
>>to be distributed to the compute nodes some how.  You could write a
>>custom input format that does not use any input files, and then have it
>>load the data a different way.  I believe that some people do this to
>>load data from MySQL or some other DB for processing.  Similarly you
>>could do something with the output format to put the data someplace else.
>>
>> It is hard to say if Hadoop is the right platform without more
>>information about what you are doing.  Hadoop has been used for lots of
>>embarrassingly parallel problems.  The processing is easy, the real
>>question is where is your data coming from, and where are the results
>>going.  Map/Reduce is fast in part because it tries to reduce data
>>movement and move the computation to the data, not the other way round.
>>Without knowing the expected size of your data or the amount of
>>processing that it will do, it is hard to say.
>>
>> --Bobby Evans
>>
>> On 9/12/11 5:09 AM, "Parker Jones" <[email protected]> wrote:
>>
>>
>>
>> Hello all,
>>
>> I have Hadoop up and running and an embarrassingly parallel problem but
>>can't figure out how to arrange the problem.  My apologies in advance if
>>this is obvious and I'm not getting it.
>>
>> My HPC application isn't a batch program, but runs in a continuous loop
>>(like a server) *outside* of the Hadoop machines, and it should
>>occasionally farm out a large computation to Hadoop and use the results.
>> However, all the examples I have come across interact with Hadoop via
>>files and the command line.  (Perhaps I am looking at the wrong places?)
>>
>> So,
>> * is Hadoop the right platform for this kind of problem?
>> * is it possible to use Hadoop without going through the command line
>>and writing all input data to files?
>>
>> If so, could someone point me to some examples and documentation.  I am
>>coding in C/C++ in case that is relevant, but examples in any language
>>should be helpful.
>>
>> Thanks for any suggestions,
>> Parker
>>
>>
>>
>
>

Re: Is Hadoop the right platform for my HPC application?

Reply via email to