Hi, Just wanted to post an update on this issue. I didn't spend a lot of time to verify for sure what was going wrong but speculative execution definitely was not the cause of the problem here. I was seeing job failures even with speculative execution set to ON.
By recreating HDFS environment and tweaking map side spill threshold percent I was able to see successful job executions. -Shrinivas On Thu, Jun 2, 2011 at 12:59 PM, Shrinivas Joshi <[email protected]>wrote: > Hi Matei, > > Thanks for your feedback. I am trying to verify/debug whether the failures > are actually due to speculative execution. I will send an update once I more > info on this. > > -Shrinivas > > > On Thu, Jun 2, 2011 at 12:40 AM, Matei Zaharia <[email protected]>wrote: > >> Usually the number of speculatively executed tasks is equal to the number >> of "killed" tasks in the UI (as opposed to "failed"). When Hadoop runs a >> speculative task, it ends up killing either the original or the speculative >> task, depending on which one finishes first. >> >> I don't think OOM errors would be caused by not having speculation though; >> there must be another problem causing that. >> >> Matei >> >> On Jun 1, 2011, at 12:42 PM, Shrinivas Joshi wrote: >> >> > To find out whether it had any positive performance impact, I am trying >> with >> > turning OFF speculative execution. Surprisingly, the job starts to fail >> in >> > reduce phase with OOM errors when I disable speculative execution for >> both >> > map and reduce tasks. Has anybody noticed similar behavior? Is there a >> way >> > to find out how many tasks were speculatively executed when speculative >> > execution is enabled? >> > >> > I am trying with a 64GB Terasort run with 8-node cluster. >> > >> > Thanks, >> > -Shrinivas >> >> >
