This is cool work! A convenient place to document this information is in the hadoop wiki:
http://wiki.apache.org/hadoop/ At the bottom of this page, there is a section titled "Related Projects". You might want to insert a link in that section. thanka, dhruba On Wed, Feb 18, 2009 at 10:37 AM, Amin Astaneh <[email protected]> wrote: > Lukáš- > > Well, we have a graduate student that is using our facilities for a > Masters' thesis in Map/Reduce. You guys are generating topics in computer > science research. > > What do we need to do in order to get our documentation on the Hadoop > pages? > > -Amin > > Thanks guys,it is good to head that Hadoop is spreading... :-) >> Regards, >> Lukas >> >> On Wed, Feb 18, 2009 at 5:24 PM, Steve Loughran <[email protected]> >> wrote: >> >> >> >>> Amin Astaneh wrote: >>> >>> >>> >>>> Lukáš- >>>> >>>> >>>> >>>>> Hi Amin, >>>>> I am not familiar with SGE, do you think you could tell me what did you >>>>> get >>>>> from this combination? What is the benefit of running Hadoop on SGE? >>>>> >>>>> >>>>> >>>>> >>>> Sun Grid Engine is a distributed resource management platform for >>>> supercomputing centers. We use it to allocate resources to a >>>> supercomputing >>>> task, such as requesting 32 processors to run a particular simulation. >>>> This >>>> mechanism is analogous to the scheduler on a multi-user OS. What I was >>>> able >>>> to accomplish was to turn Hadoop into an as-needed service. When you >>>> submit >>>> a job request to run Hadoop as the documentation describes, a Hadoop >>>> cluster >>>> of arbitrary size is instantiated depending on how many nodes were >>>> requested >>>> by generating a cluster configuration specific to that job request. This >>>> allows the Hadoop cluster to be deployed within the context of >>>> Gridengine, >>>> as well as being able to coexist with other running simulations on the >>>> cluster. >>>> >>>> To the researcher or user needing to run a mapreduce code, all they need >>>> to worry about is telling Hadoop to execute it as well as determining >>>> how >>>> many machines should be dedicated to the task. This benefit makes Hadoop >>>> very accessible to people since they don't need to worry about >>>> configuring a >>>> cluster, SGE and it's helper scripts do it for them. >>>> >>>> As Steve Loughran accurately commented, as of now we can only run one >>>> set >>>> of Hadoop slave processes per machine, due to the network binding issue. >>>> That problem is mitigated by configuring SGE to spread the slaves one >>>> per >>>> machine automatically to avoid failures. >>>> >>>> >>>> >>> Only the Namenode and JobTracker need hard-coded/well-known port numbers, >>> the rest could all be done dynamically. >>> >>> One thing SGE does offer over Xen-hosted images is better performance >>> than >>> virtual machines, for both CPU and storage, as virtualised disk >>> performance can be awful, and even on the latest x86 parts, there is a >>> measurable hit from VM overheads. >>> >>> >>> >> >> >> >> >> > >
