Actually, no. You need to write something that never exits. More like a web server never exits. Something that handles many requests without exiting. If you want <100ms response times, that is going to be the only way.
There are many reasons for slow startup of map-reduce jobs. One prominent reason is that the executable code for each program has to be copied into HDFS and then each task node has to copy the executable code out of HDFS. Then the tasktrackers in the cluster have to launch multiple JVM's. This all takes time. The amount of time is not very important if you are running a job even a few minutes long, but is a complete show-stopper for your application where, as you say, milliseconds matter. If, however, your jobs are already launched and already have a file loaded, then your requests should have less than 1 ms of overhead and will get the parallelism and redundancy that you desire. On 2/11/08 6:31 AM, "Shimi K" <[EMAIL PROTECTED]> wrote: > To do such a thing I will need to implement something which is > very similar to Hadoop map reduce but with faster startup job time. Why does > it takes Hadoop so long to start the job?
