Hello Himanshu. Could you please describe in more detail your use case? There are two basic Gridengine integration schemes:
1/ Native integration in grid engine This is the one referred to in Dan Templeton's blog. It is based on the assumption that hdfs is always running, and it will essentially have your map-reduce job (including a per-job jobtracker) scheduled as a parallel environment 'as-close-as-possible' to your hdfs data. It will be in GE 6.2u5, which is currently in beta and should be out any moment now. It is possible to back-port to 6.2u4 and probably to 6.2u3. 2/ HOD integration What hod does, in a nutshell, is to allocate a group of machines as a parallel environment within GE and to run a jobtracker and a namenode that will control the allocated machines. User will then submit their jobs to the jobtracker and use the hdfs controlled by the namenode. Of course, the resulting hadoop environment is transient since, as far as GE is concerned, it is simply a parallel job. Of course, the meaning of transient depends on how you set-up your queues. We have developed a patch to add Gridengine support to hadoop hod, http://issues.apache.org/jira/browse/HADOOP-6369 This is pretty undemanding on GE version, but it is not very efficient hdfs wise, since gridengine is ignorant of hdfs data locality. In practice, either you ask hod to use an independent hdfs that is always up -- but there is no guarantee that the tasktracker nodes will be close to the data -- or you upload your data to a new hdfs that will be created by hod. Thus, 1/ is definitely more efficient and 'cluster-wide' while 2/ is more like a sort of cluster partitioning. --gianluigi On Wed, 2009-12-09 at 12:43 -0800, himanshu chandola wrote: > Hi all, > We are integrating the hadoop jobs with the sun grid engine. Most of > the map reduce jobs that start on our cluster are sequential map and > reduce. I also found integration guidelines > here :http://blogs.sun.com/templedf/entry/beta_testing_the_sun_grid > and http://blogs.sun.com/ravee/entry/creating_hadoop_pe_under_sge . > > I wanted to know whether every sequential map-reduce job would be counted as > a separate job to sun sge. That's necessary because in total the sequential > map-reduce runs for days. > > Thanks > H > > Morpheus: Do you believe in fate, Neo? > Neo: No. > Morpheus: Why Not? > Neo: Because I don't like the idea that I'm not in control of my life. > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com