[ 
https://issues.apache.org/jira/browse/HADOOP-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12588105#action_12588105
 ] 

Doug Cutting commented on HADOOP-3229:
--------------------------------------

> Do we need to call it for each record emitted?

Yes.  The goal is to note progress whenever user code calls collect().  If a 
slow-running mapper or reducer only outputs one tiny record every minute or so, 
and the task timeout is ten minutes, tasks should not time out.  User code 
should only have to explicitly report progress when they consume inputs or 
generate outputs at a rate lower than the task timeout.

> As demonstrated in HADOOP-2284, the overhead of setting this flag- as you 
> assert- is slight, but not free.

That was worse, since sorting calls compare log(N) times per entry, not just 
once, but it's still a valid point.  If we find that setting the flag here 
significantly impacts performance, then we should explore changing the above 
contract.  But that's the contract we've advertised in the past, that either 
consuming or emitting an entry counted as task progress, no?

> Map OutputCollector does not report progress on writes
> ------------------------------------------------------
>
>                 Key: HADOOP-3229
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3229
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>             Fix For: 0.17.0
>
>         Attachments: 3229-0.patch, HADOOP-3229.patch
>
>
> It seem that the collector implementation used during the map phase does not 
> report progress on writing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to