Re: Hadoop-on-demand and torque

Charles Earl Mon, 21 May 2012 07:46:33 -0700

Ralph,
Do you have any YARN or Mesos performance comparison against HOD? I suppose 
since it was customer requirement you might not have explored it. MPI support 
seems to be active issue for Mesos now.
Charles


On May 21, 2012, at 10:36 AM, Ralph Castain <r...@open-mpi.org> wrote:

> Not quite yet, though we are working on it (some descriptive stuff is around, 
> but needs to be consolidated). Several of us started working together a 
> couple of months ago to support the MapReduce programming model on HPC 
> clusters using Open MPI as the platform. In working with our customers and 
> OMPI's wide community of users, we found that people were interested in this 
> capability, wanted to integrate MPI support into their MapReduce jobs, and 
> didn't want to migrate their clusters to YARN for various reasons.
> 
> We have released initial versions of two new tools in the OMPI developer's 
> trunk, scheduled for inclusion in the upcoming 1.7.0 release:
> 
> 1. "mr+" - executes the MapReduce programming paradigm. Currently, we only 
> support streaming data, though we will extend that support shortly. All HPC 
> environments (rsh, SLURM, Torque, Alps, LSF, Windows, etc.) are supported. 
> Both mappers and reducers can utilize MPI (independently or in combination) 
> if they so choose. Mappers and reducers can be written in any of the typical 
> HPC languages (C, C++, and Fortran) as well as Java (note: OMPI now comes 
> with Java MPI bindings).
> 
> 2. "hdfsalloc" - takes a list of files and obtains a resource allocation for 
> the nodes upon which those files reside. SLURM and Moab/Maui are currently 
> supported, with Gridengine coming soon.
> 
> There will be a public announcement of this in the near future, and we expect 
> to integrate the Hadoop 1.0 and Hadoop 2.0 MR classes over the next couple of 
> months. By the end of this summer, we should have a full-featured public 
> release.
> 
> 
> On May 20, 2012, at 2:10 PM, Brian Bockelman wrote:
> 
>> Hi Ralph,
>> 
>> I admit - I've only been half-following the OpenMPI progress.  Do you have a 
>> technical write-up of what has been done?
>> 
>> Thanks,
>> 
>> Brian
>> 
>> On May 20, 2012, at 9:31 AM, Ralph Castain wrote:
>> 
>>> FWIW: Open MPI now has an initial cut at "MR+" that runs map-reduce under 
>>> any HPC environment. We don't have the Java integration yet to support the 
>>> Hadoop MR class, but you can write a mapper/reducer and execute that 
>>> programming paradigm. We plan to integrate the Hadoop MR class soon.
>>> 
>>> If you already have that integration, we'd love to help port it over. We 
>>> already have the MPI support completed, so any mapper/reducer could use it.
>>> 
>>> 
>>> On May 20, 2012, at 7:12 AM, Pierre Antoine DuBoDeNa wrote:
>>> 
>>>> We run similar infrastructure in a university project.. we plan to install
>>>> hadoop.. and looking for "alternatives" based on hadoop in case the pure
>>>> hadoop is not working as expected.
>>>> 
>>>> Keep us updated on the code release.
>>>> 
>>>> Best,
>>>> PA
>>>> 
>>>> 2012/5/20 Stijn De Weirdt <stijn.dewei...@ugent.be>
>>>> 
>>>>> hi all,
>>>>> 
>>>>> i'm part of an HPC group of a university, and we have some users that are
>>>>> interested in Hadoop to see if it can be useful in their research and we
>>>>> also have researchers that are using hadoop already on their own
>>>>> infrastructure, but that is is not enough reason for us to start with
>>>>> dedicated dedicated Hadoop infrastructure  (we are now only running torque
>>>>> based clusters with and without shared storage; setting up and properly
>>>>> maintaining Hadoop infrastructure requires quite some understanding of new
>>>>> software)
>>>>> 
>>>>> to be able to support these needs we wanted to do just this: use current
>>>>> HPC infrastructure to make private hadoop clusters so people can do some
>>>>> work. if we attract enough interest, we will probably setup dedicated
>>>>> infrastructure, but by that time we (the admins) will also have a better
>>>>> understanding of what is required.
>>>>> 
>>>>> so we used to look at HOD for testing/running hadoop on existing
>>>>> infrastructure (never really looked at myhadoop though).
>>>>> but (imho) the current HOD code base is not in such a good state. we did
>>>>> some work to get it working and added some features, to come to the
>>>>> conclusion that it was not sufficient (and not maintainable).
>>>>> 
>>>>> so we wrote something from scratch with same functionality as HOD, and
>>>>> much more (eg HBase is now possible, with or without MR1; some default
>>>>> tuning; easy to add support for yarn instead of MR1).
>>>>> it has some suport for torque, but my laptop is also sufficient. (the
>>>>> torque support is a wrapper to submit the job)
>>>>> we gave a workshop on hadoop using it (25 people, and each with their own
>>>>> 5 node hadoop cluster) and it went rather well.
>>>>> 
>>>>> it's not in a public repo yet, but we could do that. if interested, let me
>>>>> know, and i see what can be done. (releasing the code is on our todo list,
>>>>> but if there is some demand, we can do it sooner)
>>>>> 
>>>>> 
>>>>> stijn
>>>>> 
>>>>> 
>>>>> 
>>>>> On 05/18/2012 05:07 PM, Pierre Antoine DuBoDeNa wrote:
>>>>> 
>>>>>> I am also interested to learn about myHadoop as I use a shared storage
>>>>>> system and everything runs on VMs and not actual dedicated servers.
>>>>>> 
>>>>>> in like amazon EC2 environment which you just have VMs and huge central
>>>>>> storage, is it any helpful to use hadoop to distribute jobs and maybe
>>>>>> parallelize algorithms, or is better to go with other technologies?
>>>>>> 
>>>>>> 2012/5/18 Manu S<manupk...@gmail.com>
>>>>>> 
>>>>>> Hi All,
>>>>>>> 
>>>>>>> Guess HOD could be useful existing HPC cluster with Torque scheduler
>>>>>>> which
>>>>>>> needs to run map-reduce jobs.
>>>>>>> 
>>>>>>> Also read about *myHadoop- Hadoop on demand on traditional HPC
>>>>>>> resources*will support many HPC schedulers like SGE, PBS etc to over
>>>>>>> come the
>>>>>>> integration of shared-architecture(HPC)&  shared-nothing
>>>>>>> 
>>>>>>> architecture(Hadoop).
>>>>>>> 
>>>>>>> Any real use case scenarios for integrating hadoop map/reduce in 
>>>>>>> existing
>>>>>>> HPC cluster and what are the advantages of using hadoop features in HPC
>>>>>>> cluster?
>>>>>>> 
>>>>>>> Appreciate your comments on the same.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Manu S
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Fri, May 18, 2012 at 12:41 AM, Merto Mertek<masmer...@gmail.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>> If I understand it right HOD is mentioned mainly for merging existing
>>>>>>>> HPC
>>>>>>>> clusters with hadoop and for testing purposes..
>>>>>>>> 
>>>>>>>> I cannot find what is the role of Torque here (just initial nodes
>>>>>>>> allocation?) and which is the default scheduler of HOD ?  Probably the
>>>>>>>> scheduler from the hadoop distribution?
>>>>>>>> 
>>>>>>>> In the doc is mentioned a MAUI scheduler, but probably if there would 
>>>>>>>> be
>>>>>>>> 
>>>>>>> an
>>>>>>> 
>>>>>>>> integration with hadoop there will be any document on it..
>>>>>>>> 
>>>>>>>> thanks..
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>> 
>

Re: Hadoop-on-demand and torque

Reply via email to