Hi,

Just wanted to post an update on this issue. I didn't spend a lot of time to
verify for sure what was going wrong but speculative execution definitely
was not the cause of the problem here. I was seeing job failures even with
speculative execution set to ON.

By recreating HDFS environment and tweaking map side spill threshold percent
I was able to see successful job executions.

-Shrinivas

On Thu, Jun 2, 2011 at 12:59 PM, Shrinivas Joshi <[email protected]>wrote:

> Hi Matei,
>
> Thanks for your feedback. I am trying to verify/debug whether the failures
> are actually due to speculative execution. I will send an update once I more
> info on this.
>
> -Shrinivas
>
>
> On Thu, Jun 2, 2011 at 12:40 AM, Matei Zaharia <[email protected]>wrote:
>
>> Usually the number of speculatively executed tasks is equal to the number
>> of "killed" tasks in the UI (as opposed to "failed"). When Hadoop runs a
>> speculative task, it ends up killing either the original or the speculative
>> task, depending on which one finishes first.
>>
>> I don't think OOM errors would be caused by not having speculation though;
>> there must be another problem causing that.
>>
>> Matei
>>
>> On Jun 1, 2011, at 12:42 PM, Shrinivas Joshi wrote:
>>
>> > To find out whether it had any positive performance impact, I am trying
>> with
>> > turning OFF speculative execution. Surprisingly, the job starts to fail
>> in
>> > reduce phase with OOM errors when I disable speculative execution for
>> both
>> > map and reduce tasks. Has anybody noticed similar behavior? Is there a
>> way
>> > to find out how many tasks were speculatively executed when speculative
>> > execution is enabled?
>> >
>> > I am trying with a 64GB Terasort run with 8-node cluster.
>> >
>> > Thanks,
>> > -Shrinivas
>>
>>
>

Reply via email to