Ralph, Do you have any YARN or Mesos performance comparison against HOD? I suppose since it was customer requirement you might not have explored it. MPI support seems to be active issue for Mesos now. Charles
On May 21, 2012, at 10:36 AM, Ralph Castain <r...@open-mpi.org> wrote: > Not quite yet, though we are working on it (some descriptive stuff is around, > but needs to be consolidated). Several of us started working together a > couple of months ago to support the MapReduce programming model on HPC > clusters using Open MPI as the platform. In working with our customers and > OMPI's wide community of users, we found that people were interested in this > capability, wanted to integrate MPI support into their MapReduce jobs, and > didn't want to migrate their clusters to YARN for various reasons. > > We have released initial versions of two new tools in the OMPI developer's > trunk, scheduled for inclusion in the upcoming 1.7.0 release: > > 1. "mr+" - executes the MapReduce programming paradigm. Currently, we only > support streaming data, though we will extend that support shortly. All HPC > environments (rsh, SLURM, Torque, Alps, LSF, Windows, etc.) are supported. > Both mappers and reducers can utilize MPI (independently or in combination) > if they so choose. Mappers and reducers can be written in any of the typical > HPC languages (C, C++, and Fortran) as well as Java (note: OMPI now comes > with Java MPI bindings). > > 2. "hdfsalloc" - takes a list of files and obtains a resource allocation for > the nodes upon which those files reside. SLURM and Moab/Maui are currently > supported, with Gridengine coming soon. > > There will be a public announcement of this in the near future, and we expect > to integrate the Hadoop 1.0 and Hadoop 2.0 MR classes over the next couple of > months. By the end of this summer, we should have a full-featured public > release. > > > On May 20, 2012, at 2:10 PM, Brian Bockelman wrote: > >> Hi Ralph, >> >> I admit - I've only been half-following the OpenMPI progress. Do you have a >> technical write-up of what has been done? >> >> Thanks, >> >> Brian >> >> On May 20, 2012, at 9:31 AM, Ralph Castain wrote: >> >>> FWIW: Open MPI now has an initial cut at "MR+" that runs map-reduce under >>> any HPC environment. We don't have the Java integration yet to support the >>> Hadoop MR class, but you can write a mapper/reducer and execute that >>> programming paradigm. We plan to integrate the Hadoop MR class soon. >>> >>> If you already have that integration, we'd love to help port it over. We >>> already have the MPI support completed, so any mapper/reducer could use it. >>> >>> >>> On May 20, 2012, at 7:12 AM, Pierre Antoine DuBoDeNa wrote: >>> >>>> We run similar infrastructure in a university project.. we plan to install >>>> hadoop.. and looking for "alternatives" based on hadoop in case the pure >>>> hadoop is not working as expected. >>>> >>>> Keep us updated on the code release. >>>> >>>> Best, >>>> PA >>>> >>>> 2012/5/20 Stijn De Weirdt <stijn.dewei...@ugent.be> >>>> >>>>> hi all, >>>>> >>>>> i'm part of an HPC group of a university, and we have some users that are >>>>> interested in Hadoop to see if it can be useful in their research and we >>>>> also have researchers that are using hadoop already on their own >>>>> infrastructure, but that is is not enough reason for us to start with >>>>> dedicated dedicated Hadoop infrastructure (we are now only running torque >>>>> based clusters with and without shared storage; setting up and properly >>>>> maintaining Hadoop infrastructure requires quite some understanding of new >>>>> software) >>>>> >>>>> to be able to support these needs we wanted to do just this: use current >>>>> HPC infrastructure to make private hadoop clusters so people can do some >>>>> work. if we attract enough interest, we will probably setup dedicated >>>>> infrastructure, but by that time we (the admins) will also have a better >>>>> understanding of what is required. >>>>> >>>>> so we used to look at HOD for testing/running hadoop on existing >>>>> infrastructure (never really looked at myhadoop though). >>>>> but (imho) the current HOD code base is not in such a good state. we did >>>>> some work to get it working and added some features, to come to the >>>>> conclusion that it was not sufficient (and not maintainable). >>>>> >>>>> so we wrote something from scratch with same functionality as HOD, and >>>>> much more (eg HBase is now possible, with or without MR1; some default >>>>> tuning; easy to add support for yarn instead of MR1). >>>>> it has some suport for torque, but my laptop is also sufficient. (the >>>>> torque support is a wrapper to submit the job) >>>>> we gave a workshop on hadoop using it (25 people, and each with their own >>>>> 5 node hadoop cluster) and it went rather well. >>>>> >>>>> it's not in a public repo yet, but we could do that. if interested, let me >>>>> know, and i see what can be done. (releasing the code is on our todo list, >>>>> but if there is some demand, we can do it sooner) >>>>> >>>>> >>>>> stijn >>>>> >>>>> >>>>> >>>>> On 05/18/2012 05:07 PM, Pierre Antoine DuBoDeNa wrote: >>>>> >>>>>> I am also interested to learn about myHadoop as I use a shared storage >>>>>> system and everything runs on VMs and not actual dedicated servers. >>>>>> >>>>>> in like amazon EC2 environment which you just have VMs and huge central >>>>>> storage, is it any helpful to use hadoop to distribute jobs and maybe >>>>>> parallelize algorithms, or is better to go with other technologies? >>>>>> >>>>>> 2012/5/18 Manu S<manupk...@gmail.com> >>>>>> >>>>>> Hi All, >>>>>>> >>>>>>> Guess HOD could be useful existing HPC cluster with Torque scheduler >>>>>>> which >>>>>>> needs to run map-reduce jobs. >>>>>>> >>>>>>> Also read about *myHadoop- Hadoop on demand on traditional HPC >>>>>>> resources*will support many HPC schedulers like SGE, PBS etc to over >>>>>>> come the >>>>>>> integration of shared-architecture(HPC)& shared-nothing >>>>>>> >>>>>>> architecture(Hadoop). >>>>>>> >>>>>>> Any real use case scenarios for integrating hadoop map/reduce in >>>>>>> existing >>>>>>> HPC cluster and what are the advantages of using hadoop features in HPC >>>>>>> cluster? >>>>>>> >>>>>>> Appreciate your comments on the same. >>>>>>> >>>>>>> Thanks, >>>>>>> Manu S >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, May 18, 2012 at 12:41 AM, Merto Mertek<masmer...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>> If I understand it right HOD is mentioned mainly for merging existing >>>>>>>> HPC >>>>>>>> clusters with hadoop and for testing purposes.. >>>>>>>> >>>>>>>> I cannot find what is the role of Torque here (just initial nodes >>>>>>>> allocation?) and which is the default scheduler of HOD ? Probably the >>>>>>>> scheduler from the hadoop distribution? >>>>>>>> >>>>>>>> In the doc is mentioned a MAUI scheduler, but probably if there would >>>>>>>> be >>>>>>>> >>>>>>> an >>>>>>> >>>>>>>> integration with hadoop there will be any document on it.. >>>>>>>> >>>>>>>> thanks.. >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >> >