This is the way we deal with this problem, too. We put our jar files on NFS,
and the attached patch makes possible to add those jar files to the
tasktracker classpath through a configuration property.

Thanks,
Mikhail

On Sun, Jun 28, 2009 at 5:21 PM, Stuart White <stuart.whi...@gmail.com>wrote:

> Although I've never done it, I believe you could manually copy your jar
> files out to your cluster somewhere in hadoop's classpath, and that would
> remove the need for you to copy them to your cluster at the start of each
> job.
>
> On Sun, Jun 28, 2009 at 4:08 PM, Marcus Herou <marcus.he...@tailsweep.com
> >wrote:
>
> > Hi.
> >
> > Running without a jobtracker makes the job start almost instantly.
> > I think it is due to something with the classloader. I use a huge amount
> of
> > jarfiles jobConf.set("tmpjars", "jar1.jar,jar2.jar")... which need to be
> > loaded every time I guess.
> >
> > By issuing conf.setNumTasksToExecutePerJvm(-1); will the TaskTracker
> child
> > live forever then ?
> >
> > Cheers
> >
> > //Marcus
> >
> > On Sun, Jun 28, 2009 at 9:54 PM, tim robertson <
> timrobertson...@gmail.com
> > >wrote:
> >
> > > How long does it take to start the code locally in a single thread?
> > >
> > > Can you reuse the JVM so it only starts once per node per job?
> > > conf.setNumTasksToExecutePerJvm(-1)
> > >
> > > Cheers,
> > > Tim
> > >
> > >
> > >
> > > On Sun, Jun 28, 2009 at 9:43 PM, Marcus Herou<
> marcus.he...@tailsweep.com
> > >
> > > wrote:
> > > > Hi.
> > > >
> > > > Wonder how one should improve the startup times of a hadoop job. Some
> > of
> > > my
> > > > jobs which have a lot of dependencies in terms of many jar files take
> a
> > > long
> > > > time to start in hadoop up to 2 minutes some times.
> > > > The data input amounts in these cases are neglible so it seems that
> > > Hadoop
> > > > have a really high setup cost, which I can live with but this seems
> to
> > > much.
> > > >
> > > > Let's say a job takes 10 minutes to complete then it is bad if it
> takes
> > 2
> > > > mins to set it up... 20-30 sec max would be a lot more reasonable.
> > > >
> > > > Hints ?
> > > >
> > > > //Marcus
> > > >
> > > >
> > > > --
> > > > Marcus Herou CTO and co-founder Tailsweep AB
> > > > +46702561312
> > > > marcus.he...@tailsweep.com
> > > > http://www.tailsweep.com/
> > > >
> > >
> >
> >
> >
> > --
> > Marcus Herou CTO and co-founder Tailsweep AB
> > +46702561312
> > marcus.he...@tailsweep.com
> > http://www.tailsweep.com/
> >
>
diff -rc java_original/org/apache/hadoop/mapred/TaskRunner.java java/org/apache/hadoop/mapred/TaskRunner.java
*** mapred_original/org/apache/hadoop/mapred/TaskRunner.java	2008-04-19 17:45:50.730243865 -0400
--- mapred/org/apache/hadoop/mapred/TaskRunner.java	2008-04-19 17:48:47.240302624 -0400
***************
*** 262,267 ****
--- 262,279 ----
  
        classPath.append(sep);
        classPath.append(workDir);
+       
+       // Additional classpath specified by client (e.g. Jar libraries
+       // stored in NFS).
+       {
+         String additionalClassPath = 
+                 conf.get("mapred.additional.class.path");
+         if (additionalClassPath != null) {
+           classPath.append(sep);
+           classPath.append(additionalClassPath);
+         }
+       }
+       
        //  Build exec child jmv args.
        Vector<String> vargs = new Vector<String>(8);
        File jvm =                                  // use same jvm as parent

Reply via email to