Hard-coded delays in order to make a protocol work are almost never correct in the long run. This isn't a function of real-time or batch, it is simply a matter of the fact that hard-coded delays don't scale correctly as problem sizes/durations change. *Adaptive* delays such a progressive back-off can work correctly under scale changes, but *fixed* delays are almost never correct.
Delays may work as a band-aid in the short run, but eventually you have to take the band-aid off. On 3/3/08 8:46 AM, "Amar Kamat" <[EMAIL PROTECTED]> wrote: > HADOOP is not meant for real time applications. Its more or less designed > for long running applications like crawlers/indexers. > Amar > On Mon, 3 Mar 2008, Spiros Papadimitriou wrote: > >> Hi >> >> I'd be interested to know if you've tried to use Hadoop for a large number >> of short jobs. Perhaps I am missing something, but I've found that the >> hardcoded Thread.sleep() calls, esp. those for 5 seconds in >> mapred.ReduceTask (primarily) and mapred.JobClient, cause more of a problem >> than the 0.3 sec or so that it takes to fire up a JVM. >> >> Agreed that for long running jobs that is not a concern, but *if* we'd want >> to speed things up for shorter running jobs (say < 1 min) is a goal, then >> JVM reuse would seem to be a lower priority? Would doing something about >> those sleep()s seem worthwhile?
