Hard-coded delays in order to make a protocol work are almost never correct
in the long run.  This isn't a function of real-time or batch, it is simply
a matter of the fact that hard-coded delays don't scale correctly as problem
sizes/durations change.  *Adaptive* delays such a progressive back-off can
work correctly under scale changes, but *fixed* delays are almost never
correct. 

Delays may work as a band-aid in the short run, but eventually you have to
take the band-aid off.


On 3/3/08 8:46 AM, "Amar Kamat" <[EMAIL PROTECTED]> wrote:

> HADOOP is not meant for real time applications. Its more or less designed
> for long running applications like crawlers/indexers.
> Amar
> On Mon, 3 Mar 2008, Spiros Papadimitriou wrote:
> 
>> Hi
>> 
>> I'd be interested to know if you've tried to use Hadoop for a large number
>> of short jobs.  Perhaps I am missing something, but I've found that the
>> hardcoded Thread.sleep() calls, esp. those for 5 seconds in
>> mapred.ReduceTask (primarily) and mapred.JobClient, cause more of a problem
>> than the 0.3 sec or so that it takes to fire up a JVM.
>> 
>> Agreed that for long running jobs that is not a concern, but *if* we'd want
>> to speed things up for shorter running jobs  (say < 1 min) is a goal, then
>> JVM reuse would seem to be a lower priority?  Would doing something about
>> those sleep()s seem worthwhile?

Reply via email to