Re: Hadoop-on-demand and torque

Stijn De Weirdt Sun, 20 May 2012 01:52:15 -0700

hi all,

i'm part of an HPC group of a university, and we have some users thatare interested in Hadoop to see if it can be useful in their researchand we also have researchers that are using hadoop already on their owninfrastructure, but that is is not enough reason for us to start withdedicated dedicated Hadoop infrastructure (we are now only runningtorque based clusters with and without shared storage; setting up andproperly maintaining Hadoop infrastructure requires quite someunderstanding of new software)

to be able to support these needs we wanted to do just this: use currentHPC infrastructure to make private hadoop clusters so people can do somework. if we attract enough interest, we will probably setup dedicatedinfrastructure, but by that time we (the admins) will also have a betterunderstanding of what is required.

so we used to look at HOD for testing/running hadoop on existinginfrastructure (never really looked at myhadoop though).but (imho) the current HOD code base is not in such a good state. we didsome work to get it working and added some features, to come to theconclusion that it was not sufficient (and not maintainable).

so we wrote something from scratch with same functionality as HOD, andmuch more (eg HBase is now possible, with or without MR1; some defaulttuning; easy to add support for yarn instead of MR1).it has some suport for torque, but my laptop is also sufficient. (thetorque support is a wrapper to submit the job)we gave a workshop on hadoop using it (25 people, and each with theirown 5 node hadoop cluster) and it went rather well.

it's not in a public repo yet, but we could do that. if interested, letme know, and i see what can be done. (releasing the code is on our todolist, but if there is some demand, we can do it sooner)



stijn


On 05/18/2012 05:07 PM, Pierre Antoine DuBoDeNa wrote:

I am also interested to learn about myHadoop as I use a shared storage
system and everything runs on VMs and not actual dedicated servers.

in like amazon EC2 environment which you just have VMs and huge central
storage, is it any helpful to use hadoop to distribute jobs and maybe
parallelize algorithms, or is better to go with other technologies?

2012/5/18 Manu S<manupk...@gmail.com>

Hi All,

Guess HOD could be useful existing HPC cluster with Torque scheduler which
needs to run map-reduce jobs.

Also read about *myHadoop- Hadoop on demand on traditional HPC
resources*will support many HPC schedulers like SGE, PBS etc to over
come the
integration of shared-architecture(HPC)&  shared-nothing
architecture(Hadoop).

Any real use case scenarios for integrating hadoop map/reduce in existing
HPC cluster and what are the advantages of using hadoop features in HPC
cluster?

Appreciate your comments on the same.

Thanks,
Manu S



On Fri, May 18, 2012 at 12:41 AM, Merto Mertek<masmer...@gmail.com>
wrote:

If I understand it right HOD is mentioned mainly for merging existing HPC
clusters with hadoop and for testing purposes..

I cannot find what is the role of Torque here (just initial nodes
allocation?) and which is the default scheduler of HOD ?  Probably the
scheduler from the hadoop distribution?

In the doc is mentioned a MAUI scheduler, but probably if there would be

an

integration with hadoop there will be any document on it..

thanks..

Re: Hadoop-on-demand and torque

Reply via email to