Hi.

Running without a jobtracker makes the job start almost instantly.
I think it is due to something with the classloader. I use a huge amount of
jarfiles jobConf.set("tmpjars", "jar1.jar,jar2.jar")... which need to be
loaded every time I guess.

By issuing conf.setNumTasksToExecutePerJvm(-1); will the TaskTracker child
live forever then ?

Cheers

//Marcus

On Sun, Jun 28, 2009 at 9:54 PM, tim robertson <timrobertson...@gmail.com>wrote:

> How long does it take to start the code locally in a single thread?
>
> Can you reuse the JVM so it only starts once per node per job?
> conf.setNumTasksToExecutePerJvm(-1)
>
> Cheers,
> Tim
>
>
>
> On Sun, Jun 28, 2009 at 9:43 PM, Marcus Herou<marcus.he...@tailsweep.com>
> wrote:
> > Hi.
> >
> > Wonder how one should improve the startup times of a hadoop job. Some of
> my
> > jobs which have a lot of dependencies in terms of many jar files take a
> long
> > time to start in hadoop up to 2 minutes some times.
> > The data input amounts in these cases are neglible so it seems that
> Hadoop
> > have a really high setup cost, which I can live with but this seems to
> much.
> >
> > Let's say a job takes 10 minutes to complete then it is bad if it takes 2
> > mins to set it up... 20-30 sec max would be a lot more reasonable.
> >
> > Hints ?
> >
> > //Marcus
> >
> >
> > --
> > Marcus Herou CTO and co-founder Tailsweep AB
> > +46702561312
> > marcus.he...@tailsweep.com
> > http://www.tailsweep.com/
> >
>



-- 
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
marcus.he...@tailsweep.com
http://www.tailsweep.com/

Reply via email to