[jira] Commented: (MAPREDUCE-2187) map tasks timeout during sorting

Luke Lu (JIRA) Mon, 15 Nov 2010 14:52:37 -0800

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932246#action_12932246
 ]


Luke Lu commented on MAPREDUCE-2187:
------------------------------------

Do you have combiner configured? How many reduces?

The reason I'm asking is that the if you don't have combiner set, the progress 
is reported at least once per partition and  per 10000 
(mapred.merge.recordsBeforeProgress) records.

> map tasks timeout during sorting
> --------------------------------
>
>                 Key: MAPREDUCE-2187
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2187
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.20.2
>            Reporter: Gianmarco De Francisci Morales
>
> During the execution of a large job, the map tasks timeout:
> {code}
> INFO mapred.JobClient: Task Id : attempt_201010290414_60974_m_000057_1, 
> Status : FAILED
> Task attempt_201010290414_60974_m_000057_1 failed to report status for 609 
> seconds. Killing!
> {code}
> The bug is in the fact that the mapper has already finished, and, according 
> to the logs, the timeout occurs during the merge sort phase.
> The intermediate data generated by the map task is quite large. So I think 
> this is the problem.
> The logs show that the merge-sort was running for 10 minutes when the task 
> was killed.
> I think the mapred.Merger should call Reporter.progress() somewhere.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-2187) map tasks timeout during sorting

Reply via email to