Hi Matei, Thanks for your feedback. I am trying to verify/debug whether the failures are actually due to speculative execution. I will send an update once I more info on this.
-Shrinivas On Thu, Jun 2, 2011 at 12:40 AM, Matei Zaharia <[email protected]>wrote: > Usually the number of speculatively executed tasks is equal to the number > of "killed" tasks in the UI (as opposed to "failed"). When Hadoop runs a > speculative task, it ends up killing either the original or the speculative > task, depending on which one finishes first. > > I don't think OOM errors would be caused by not having speculation though; > there must be another problem causing that. > > Matei > > On Jun 1, 2011, at 12:42 PM, Shrinivas Joshi wrote: > > > To find out whether it had any positive performance impact, I am trying > with > > turning OFF speculative execution. Surprisingly, the job starts to fail > in > > reduce phase with OOM errors when I disable speculative execution for > both > > map and reduce tasks. Has anybody noticed similar behavior? Is there a > way > > to find out how many tasks were speculatively executed when speculative > > execution is enabled? > > > > I am trying with a 64GB Terasort run with 8-node cluster. > > > > Thanks, > > -Shrinivas > >
