Makes sense... I will try both rsync and NFS but I think rsync will beat NFS
since NFS can be slow as hell sometimes but what the heck we already have
our maven2 repo on NFS so why not :)

Are you saying that this patch make the client able to configure which
"extra" local jar files to add as classpath when firing up the
TaskTrackerChild ?

To be explicit: Do you confirm that using tmpjars like I do is a costful
slow operation ?

To what branch to you apply the patch (we use 0.18.3) ?

Cheers

//Marcus


On Sun, Jun 28, 2009 at 11:26 PM, Mikhail Bautin <mbau...@gmail.com> wrote:

> This is the way we deal with this problem, too. We put our jar files on
> NFS, and the attached patch makes possible to add those jar files to the
> tasktracker classpath through a configuration property.
>
> Thanks,
> Mikhail
>
> On Sun, Jun 28, 2009 at 5:21 PM, Stuart White <stuart.whi...@gmail.com>wrote:
>
>> Although I've never done it, I believe you could manually copy your jar
>> files out to your cluster somewhere in hadoop's classpath, and that would
>> remove the need for you to copy them to your cluster at the start of each
>> job.
>>
>> On Sun, Jun 28, 2009 at 4:08 PM, Marcus Herou <marcus.he...@tailsweep.com
>> >wrote:
>>
>> > Hi.
>> >
>> > Running without a jobtracker makes the job start almost instantly.
>> > I think it is due to something with the classloader. I use a huge amount
>> of
>> > jarfiles jobConf.set("tmpjars", "jar1.jar,jar2.jar")... which need to be
>> > loaded every time I guess.
>> >
>> > By issuing conf.setNumTasksToExecutePerJvm(-1); will the TaskTracker
>> child
>> > live forever then ?
>> >
>> > Cheers
>> >
>> > //Marcus
>> >
>> > On Sun, Jun 28, 2009 at 9:54 PM, tim robertson <
>> timrobertson...@gmail.com
>> > >wrote:
>> >
>> > > How long does it take to start the code locally in a single thread?
>> > >
>> > > Can you reuse the JVM so it only starts once per node per job?
>> > > conf.setNumTasksToExecutePerJvm(-1)
>> > >
>> > > Cheers,
>> > > Tim
>> > >
>> > >
>> > >
>> > > On Sun, Jun 28, 2009 at 9:43 PM, Marcus Herou<
>> marcus.he...@tailsweep.com
>> > >
>> > > wrote:
>> > > > Hi.
>> > > >
>> > > > Wonder how one should improve the startup times of a hadoop job.
>> Some
>> > of
>> > > my
>> > > > jobs which have a lot of dependencies in terms of many jar files
>> take a
>> > > long
>> > > > time to start in hadoop up to 2 minutes some times.
>> > > > The data input amounts in these cases are neglible so it seems that
>> > > Hadoop
>> > > > have a really high setup cost, which I can live with but this seems
>> to
>> > > much.
>> > > >
>> > > > Let's say a job takes 10 minutes to complete then it is bad if it
>> takes
>> > 2
>> > > > mins to set it up... 20-30 sec max would be a lot more reasonable.
>> > > >
>> > > > Hints ?
>> > > >
>> > > > //Marcus
>> > > >
>> > > >
>> > > > --
>> > > > Marcus Herou CTO and co-founder Tailsweep AB
>> > > > +46702561312
>> > > > marcus.he...@tailsweep.com
>> > > > http://www.tailsweep.com/
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> > Marcus Herou CTO and co-founder Tailsweep AB
>> > +46702561312
>> > marcus.he...@tailsweep.com
>> > http://www.tailsweep.com/
>> >
>>
>
>


-- 
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
marcus.he...@tailsweep.com
http://www.tailsweep.com/

Reply via email to