Re: Hadoop-on-demand and torque

Ralph Castain Sun, 20 May 2012 06:32:27 -0700

FWIW: Open MPI now has an initial cut at "MR+" that runs map-reduce under any 
HPC environment. We don't have the Java integration yet to support the Hadoop 
MR class, but you can write a mapper/reducer and execute that programming 
paradigm. We plan to integrate the Hadoop MR class soon.


If you already have that integration, we'd love to help port it over. We 
already have the MPI support completed, so any mapper/reducer could use it.


On May 20, 2012, at 7:12 AM, Pierre Antoine DuBoDeNa wrote:

> We run similar infrastructure in a university project.. we plan to install
> hadoop.. and looking for "alternatives" based on hadoop in case the pure
> hadoop is not working as expected.
> 
> Keep us updated on the code release.
> 
> Best,
> PA
> 
> 2012/5/20 Stijn De Weirdt <stijn.dewei...@ugent.be>
> 
>> hi all,
>> 
>> i'm part of an HPC group of a university, and we have some users that are
>> interested in Hadoop to see if it can be useful in their research and we
>> also have researchers that are using hadoop already on their own
>> infrastructure, but that is is not enough reason for us to start with
>> dedicated dedicated Hadoop infrastructure  (we are now only running torque
>> based clusters with and without shared storage; setting up and properly
>> maintaining Hadoop infrastructure requires quite some understanding of new
>> software)
>> 
>> to be able to support these needs we wanted to do just this: use current
>> HPC infrastructure to make private hadoop clusters so people can do some
>> work. if we attract enough interest, we will probably setup dedicated
>> infrastructure, but by that time we (the admins) will also have a better
>> understanding of what is required.
>> 
>> so we used to look at HOD for testing/running hadoop on existing
>> infrastructure (never really looked at myhadoop though).
>> but (imho) the current HOD code base is not in such a good state. we did
>> some work to get it working and added some features, to come to the
>> conclusion that it was not sufficient (and not maintainable).
>> 
>> so we wrote something from scratch with same functionality as HOD, and
>> much more (eg HBase is now possible, with or without MR1; some default
>> tuning; easy to add support for yarn instead of MR1).
>> it has some suport for torque, but my laptop is also sufficient. (the
>> torque support is a wrapper to submit the job)
>> we gave a workshop on hadoop using it (25 people, and each with their own
>> 5 node hadoop cluster) and it went rather well.
>> 
>> it's not in a public repo yet, but we could do that. if interested, let me
>> know, and i see what can be done. (releasing the code is on our todo list,
>> but if there is some demand, we can do it sooner)
>> 
>> 
>> stijn
>> 
>> 
>> 
>> On 05/18/2012 05:07 PM, Pierre Antoine DuBoDeNa wrote:
>> 
>>> I am also interested to learn about myHadoop as I use a shared storage
>>> system and everything runs on VMs and not actual dedicated servers.
>>> 
>>> in like amazon EC2 environment which you just have VMs and huge central
>>> storage, is it any helpful to use hadoop to distribute jobs and maybe
>>> parallelize algorithms, or is better to go with other technologies?
>>> 
>>> 2012/5/18 Manu S<manupk...@gmail.com>
>>> 
>>> Hi All,
>>>> 
>>>> Guess HOD could be useful existing HPC cluster with Torque scheduler
>>>> which
>>>> needs to run map-reduce jobs.
>>>> 
>>>> Also read about *myHadoop- Hadoop on demand on traditional HPC
>>>> resources*will support many HPC schedulers like SGE, PBS etc to over
>>>> come the
>>>> integration of shared-architecture(HPC)&  shared-nothing
>>>> 
>>>> architecture(Hadoop).
>>>> 
>>>> Any real use case scenarios for integrating hadoop map/reduce in existing
>>>> HPC cluster and what are the advantages of using hadoop features in HPC
>>>> cluster?
>>>> 
>>>> Appreciate your comments on the same.
>>>> 
>>>> Thanks,
>>>> Manu S
>>>> 
>>>> 
>>>> 
>>>> On Fri, May 18, 2012 at 12:41 AM, Merto Mertek<masmer...@gmail.com>
>>>> wrote:
>>>> 
>>>> If I understand it right HOD is mentioned mainly for merging existing
>>>>> HPC
>>>>> clusters with hadoop and for testing purposes..
>>>>> 
>>>>> I cannot find what is the role of Torque here (just initial nodes
>>>>> allocation?) and which is the default scheduler of HOD ?  Probably the
>>>>> scheduler from the hadoop distribution?
>>>>> 
>>>>> In the doc is mentioned a MAUI scheduler, but probably if there would be
>>>>> 
>>>> an
>>>> 
>>>>> integration with hadoop there will be any document on it..
>>>>> 
>>>>> thanks..
>>>>> 
>>>>> 
>>>> 
>>> 
>>

Re: Hadoop-on-demand and torque

Reply via email to