Of course... Thanks for the help! Cheers
//Marcus On Mon, Jun 29, 2009 at 12:32 AM, Mikhail Bautin <mbau...@gmail.com> wrote: > Marcus, > > The code that needs to patched is in the tasktracker, because the > tasktracker is what starts the child JVM that runs user code. > > Thanks, > Mikhail > > On Sun, Jun 28, 2009 at 6:14 PM, Marcus Herou <marcus.he...@tailsweep.com > >wrote: > > > Hi. > > > > Just to be clear. It is the jobtracker that needs the patched code right > ? > > Or is it the tasktrackers ? > > > > Kindly > > > > //Marcus > > > > On Mon, Jun 29, 2009 at 12:08 AM, Mikhail Bautin <mbau...@gmail.com> > > wrote: > > > > > Marcus, > > > > > > We currently use 0.20.0 but this patch just inserts 8 lines of code > into > > > TaskRunner.java, which could certainly be done with 0.18.3. > > > > > > Yes, this patch just appends additional jars to the child JVM > classpath. > > > > > > I've never really used tmpjars myself, but if it involves uploading > > > multiple > > > jar files into HDFS every time a job is started, I see how it can be > > really > > > slow. On our ~80-job workflow this would have really slowed things > down. > > > > > > Thanks, > > > Mikhail > > > > > > On Sun, Jun 28, 2009 at 5:40 PM, Marcus Herou < > > marcus.he...@tailsweep.com > > > >wrote: > > > > > > > Makes sense... I will try both rsync and NFS but I think rsync will > > beat > > > > NFS > > > > since NFS can be slow as hell sometimes but what the heck we already > > have > > > > our maven2 repo on NFS so why not :) > > > > > > > > Are you saying that this patch make the client able to configure > which > > > > "extra" local jar files to add as classpath when firing up the > > > > TaskTrackerChild ? > > > > > > > > To be explicit: Do you confirm that using tmpjars like I do is a > > costful > > > > slow operation ? > > > > > > > > To what branch to you apply the patch (we use 0.18.3) ? > > > > > > > > Cheers > > > > > > > > //Marcus > > > > > > > > > > > > On Sun, Jun 28, 2009 at 11:26 PM, Mikhail Bautin <mbau...@gmail.com> > > > > wrote: > > > > > > > > > This is the way we deal with this problem, too. We put our jar > files > > on > > > > > NFS, and the attached patch makes possible to add those jar files > to > > > the > > > > > tasktracker classpath through a configuration property. > > > > > > > > > > Thanks, > > > > > Mikhail > > > > > > > > > > On Sun, Jun 28, 2009 at 5:21 PM, Stuart White < > > stuart.whi...@gmail.com > > > > >wrote: > > > > > > > > > >> Although I've never done it, I believe you could manually copy > your > > > jar > > > > >> files out to your cluster somewhere in hadoop's classpath, and > that > > > > would > > > > >> remove the need for you to copy them to your cluster at the start > of > > > > each > > > > >> job. > > > > >> > > > > >> On Sun, Jun 28, 2009 at 4:08 PM, Marcus Herou < > > > > marcus.he...@tailsweep.com > > > > >> >wrote: > > > > >> > > > > >> > Hi. > > > > >> > > > > > >> > Running without a jobtracker makes the job start almost > instantly. > > > > >> > I think it is due to something with the classloader. I use a > huge > > > > amount > > > > >> of > > > > >> > jarfiles jobConf.set("tmpjars", "jar1.jar,jar2.jar")... which > need > > > to > > > > be > > > > >> > loaded every time I guess. > > > > >> > > > > > >> > By issuing conf.setNumTasksToExecutePerJvm(-1); will the > > TaskTracker > > > > >> child > > > > >> > live forever then ? > > > > >> > > > > > >> > Cheers > > > > >> > > > > > >> > //Marcus > > > > >> > > > > > >> > On Sun, Jun 28, 2009 at 9:54 PM, tim robertson < > > > > >> timrobertson...@gmail.com > > > > >> > >wrote: > > > > >> > > > > > >> > > How long does it take to start the code locally in a single > > > thread? > > > > >> > > > > > > >> > > Can you reuse the JVM so it only starts once per node per job? > > > > >> > > conf.setNumTasksToExecutePerJvm(-1) > > > > >> > > > > > > >> > > Cheers, > > > > >> > > Tim > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > On Sun, Jun 28, 2009 at 9:43 PM, Marcus Herou< > > > > >> marcus.he...@tailsweep.com > > > > >> > > > > > > >> > > wrote: > > > > >> > > > Hi. > > > > >> > > > > > > > >> > > > Wonder how one should improve the startup times of a hadoop > > job. > > > > >> Some > > > > >> > of > > > > >> > > my > > > > >> > > > jobs which have a lot of dependencies in terms of many jar > > files > > > > >> take a > > > > >> > > long > > > > >> > > > time to start in hadoop up to 2 minutes some times. > > > > >> > > > The data input amounts in these cases are neglible so it > seems > > > > that > > > > >> > > Hadoop > > > > >> > > > have a really high setup cost, which I can live with but > this > > > > seems > > > > >> to > > > > >> > > much. > > > > >> > > > > > > > >> > > > Let's say a job takes 10 minutes to complete then it is bad > if > > > it > > > > >> takes > > > > >> > 2 > > > > >> > > > mins to set it up... 20-30 sec max would be a lot more > > > reasonable. > > > > >> > > > > > > > >> > > > Hints ? > > > > >> > > > > > > > >> > > > //Marcus > > > > >> > > > > > > > >> > > > > > > > >> > > > -- > > > > >> > > > Marcus Herou CTO and co-founder Tailsweep AB > > > > >> > > > +46702561312 > > > > >> > > > marcus.he...@tailsweep.com > > > > >> > > > http://www.tailsweep.com/ > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > >> > > > > > >> > -- > > > > >> > Marcus Herou CTO and co-founder Tailsweep AB > > > > >> > +46702561312 > > > > >> > marcus.he...@tailsweep.com > > > > >> > http://www.tailsweep.com/ > > > > >> > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > Marcus Herou CTO and co-founder Tailsweep AB > > +46702561312 > > marcus.he...@tailsweep.com > > http://www.tailsweep.com/ > > > -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/