[
https://issues.apache.org/jira/browse/MAPREDUCE-2187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932462#action_12932462
]
Gianmarco De Francisci Morales commented on MAPREDUCE-2187:
-----------------------------------------------------------
I forgot I actually do have a combiner.
In the combiner, I call reporter.progress() every 1000 values.
The numbers are a bit high for the number of mappers/reducers I am using, but
it was an exploratory job.
Map input records 454,219
Map output records 29,528,547,433
Map output bytes 503,179,031,513
Combine output records map=56,287,259,615 red=13,553,888,779
Reduce input records 15,567,573,707
Reduce output records 2,509,983
Reduce shuffle bytes 337,876,374,027
The first time I tried with 400 mappers and 100 reducers, and I had the
timeouts.
The job manages to end with 2000 mappers and 200 reducers.
I tried with a larger input and I had the same timeouts.
The size of the record shouldn't be that large.
The key is always an int pair.
The value is either an int-float pair (most of them, 29,528,011,793) or an
array of long-double pairs (535,640 records, for a total size of 649,693,592
bytes). I am using MultipleInputs and GenericWritable to shuffle them together,
can this be the culprit?
> map tasks timeout during sorting
> --------------------------------
>
> Key: MAPREDUCE-2187
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2187
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Affects Versions: 0.20.2
> Reporter: Gianmarco De Francisci Morales
>
> During the execution of a large job, the map tasks timeout:
> {code}
> INFO mapred.JobClient: Task Id : attempt_201010290414_60974_m_000057_1,
> Status : FAILED
> Task attempt_201010290414_60974_m_000057_1 failed to report status for 609
> seconds. Killing!
> {code}
> The bug is in the fact that the mapper has already finished, and, according
> to the logs, the timeout occurs during the merge sort phase.
> The intermediate data generated by the map task is quite large. So I think
> this is the problem.
> The logs show that the merge-sort was running for 10 minutes when the task
> was killed.
> I think the mapred.Merger should call Reporter.progress() somewhere.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.