Re: Integration with SGE

Arun C Murthy Wed, 18 Feb 2009 10:59:45 -0800


On Feb 18, 2009, at 10:37 AM, Amin Astaneh wrote:

Lukáš-
Well, we have a graduate student that is using our facilities for aMasters' thesis in Map/Reduce. You guys are generating topics incomputer science research.
What do we need to do in order to get our documentation on theHadoop pages?


You have a couple of options:

a) Put it on the Hadoop wiki (http://wiki.apache.org/hadoop/), fore.g. look at the ones which have docs on using Hadoop on EC2/S3.b) Open a jira (Create New Issue at https://issues.apache.org/jira/browse/HADOOP)and attach forrest-based documentation.


Arun

-Amin
Thanks guys,it is good to head that Hadoop is spreading... :-)
Regards,
Lukas
On Wed, Feb 18, 2009 at 5:24 PM, Steve Loughran <[email protected]>wrote:
Amin Astaneh wrote:
Lukáš-
Hi Amin,
I am not familiar with SGE, do you think you could tell me whatdid you
get
from this combination? What is the benefit of running Hadoop onSGE?
Sun Grid Engine is a distributed resource management platform for
supercomputing centers. We use it to allocate resources to asupercomputingtask, such as requesting 32 processors to run a particularsimulation. Thismechanism is analogous to the scheduler on a multi-user OS. WhatI was ableto accomplish was to turn Hadoop into an as-needed service. Whenyou submita job request to run Hadoop as the documentation describes, aHadoop clusterof arbitrary size is instantiated depending on how many nodeswere requestedby generating a cluster configuration specific to that jobrequest. Thisallows the Hadoop cluster to be deployed within the context ofGridengine,as well as being able to coexist with other running simulationson the
cluster.
To the researcher or user needing to run a mapreduce code, allthey needto worry about is telling Hadoop to execute it as well asdetermining howmany machines should be dedicated to the task. This benefit makesHadoopvery accessible to people since they don't need to worry aboutconfiguring a
cluster, SGE and it's helper scripts do it for them.
As Steve Loughran accurately commented, as of now we can only runone setof Hadoop slave processes per machine, due to the network bindingissue.That problem is mitigated by configuring SGE to spread the slavesone per
machine automatically to avoid failures.
Only the Namenode and JobTracker need hard-coded/well-known portnumbers,
the rest could all be done dynamically.
One thing SGE does offer over Xen-hosted images is betterperformance than
virtual machines, for both CPU  and storage, as  virtualised disk
performance can be awful, and even on the latest x86 parts, thereis a
measurable hit from VM overheads.

Re: Integration with SGE

Reply via email to