Hi. Running without a jobtracker makes the job start almost instantly. I think it is due to something with the classloader. I use a huge amount of jarfiles jobConf.set("tmpjars", "jar1.jar,jar2.jar")... which need to be loaded every time I guess.
By issuing conf.setNumTasksToExecutePerJvm(-1); will the TaskTracker child live forever then ? Cheers //Marcus On Sun, Jun 28, 2009 at 9:54 PM, tim robertson <timrobertson...@gmail.com>wrote: > How long does it take to start the code locally in a single thread? > > Can you reuse the JVM so it only starts once per node per job? > conf.setNumTasksToExecutePerJvm(-1) > > Cheers, > Tim > > > > On Sun, Jun 28, 2009 at 9:43 PM, Marcus Herou<marcus.he...@tailsweep.com> > wrote: > > Hi. > > > > Wonder how one should improve the startup times of a hadoop job. Some of > my > > jobs which have a lot of dependencies in terms of many jar files take a > long > > time to start in hadoop up to 2 minutes some times. > > The data input amounts in these cases are neglible so it seems that > Hadoop > > have a really high setup cost, which I can live with but this seems to > much. > > > > Let's say a job takes 10 minutes to complete then it is bad if it takes 2 > > mins to set it up... 20-30 sec max would be a lot more reasonable. > > > > Hints ? > > > > //Marcus > > > > > > -- > > Marcus Herou CTO and co-founder Tailsweep AB > > +46702561312 > > marcus.he...@tailsweep.com > > http://www.tailsweep.com/ > > > -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/