Makes sense... I will try both rsync and NFS but I think rsync will beat NFS since NFS can be slow as hell sometimes but what the heck we already have our maven2 repo on NFS so why not :)
Are you saying that this patch make the client able to configure which "extra" local jar files to add as classpath when firing up the TaskTrackerChild ? To be explicit: Do you confirm that using tmpjars like I do is a costful slow operation ? To what branch to you apply the patch (we use 0.18.3) ? Cheers //Marcus On Sun, Jun 28, 2009 at 11:26 PM, Mikhail Bautin <mbau...@gmail.com> wrote: > This is the way we deal with this problem, too. We put our jar files on > NFS, and the attached patch makes possible to add those jar files to the > tasktracker classpath through a configuration property. > > Thanks, > Mikhail > > On Sun, Jun 28, 2009 at 5:21 PM, Stuart White <stuart.whi...@gmail.com>wrote: > >> Although I've never done it, I believe you could manually copy your jar >> files out to your cluster somewhere in hadoop's classpath, and that would >> remove the need for you to copy them to your cluster at the start of each >> job. >> >> On Sun, Jun 28, 2009 at 4:08 PM, Marcus Herou <marcus.he...@tailsweep.com >> >wrote: >> >> > Hi. >> > >> > Running without a jobtracker makes the job start almost instantly. >> > I think it is due to something with the classloader. I use a huge amount >> of >> > jarfiles jobConf.set("tmpjars", "jar1.jar,jar2.jar")... which need to be >> > loaded every time I guess. >> > >> > By issuing conf.setNumTasksToExecutePerJvm(-1); will the TaskTracker >> child >> > live forever then ? >> > >> > Cheers >> > >> > //Marcus >> > >> > On Sun, Jun 28, 2009 at 9:54 PM, tim robertson < >> timrobertson...@gmail.com >> > >wrote: >> > >> > > How long does it take to start the code locally in a single thread? >> > > >> > > Can you reuse the JVM so it only starts once per node per job? >> > > conf.setNumTasksToExecutePerJvm(-1) >> > > >> > > Cheers, >> > > Tim >> > > >> > > >> > > >> > > On Sun, Jun 28, 2009 at 9:43 PM, Marcus Herou< >> marcus.he...@tailsweep.com >> > > >> > > wrote: >> > > > Hi. >> > > > >> > > > Wonder how one should improve the startup times of a hadoop job. >> Some >> > of >> > > my >> > > > jobs which have a lot of dependencies in terms of many jar files >> take a >> > > long >> > > > time to start in hadoop up to 2 minutes some times. >> > > > The data input amounts in these cases are neglible so it seems that >> > > Hadoop >> > > > have a really high setup cost, which I can live with but this seems >> to >> > > much. >> > > > >> > > > Let's say a job takes 10 minutes to complete then it is bad if it >> takes >> > 2 >> > > > mins to set it up... 20-30 sec max would be a lot more reasonable. >> > > > >> > > > Hints ? >> > > > >> > > > //Marcus >> > > > >> > > > >> > > > -- >> > > > Marcus Herou CTO and co-founder Tailsweep AB >> > > > +46702561312 >> > > > marcus.he...@tailsweep.com >> > > > http://www.tailsweep.com/ >> > > > >> > > >> > >> > >> > >> > -- >> > Marcus Herou CTO and co-founder Tailsweep AB >> > +46702561312 >> > marcus.he...@tailsweep.com >> > http://www.tailsweep.com/ >> > >> > > -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/