Re: Caching frequently map input files

Ted Dunning Mon, 11 Feb 2008 08:27:45 -0800

Actually, no.

You need to write something that never exits.  More like a web server never
exits.  Something that handles many requests without exiting.  If you want
<100ms response times, that is going to be the only way.

There are many reasons for slow startup of map-reduce jobs.  One prominent
reason is that the executable code for each program has to be copied into
HDFS and then each task node has to copy the executable code out of HDFS.
Then the tasktrackers in the cluster have to launch multiple JVM's.  This
all takes time.  The amount of time is not very important if you are running
a job even a few minutes long, but is a complete show-stopper for your
application where, as you say, milliseconds matter.

If, however, your jobs are already launched and already have a file loaded,
then your requests should have less than 1 ms of overhead and will get the
parallelism and redundancy that you desire.

On 2/11/08 6:31 AM, "Shimi K" <[EMAIL PROTECTED]> wrote:

> To do such a thing I will need to implement something which is
> very similar to Hadoop map reduce but with faster startup job time. Why does
> it takes Hadoop so long to start the job?

Re: Caching frequently map input files

Reply via email to