[ 
https://issues.apache.org/jira/browse/HADOOP-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500489
 ] 

Owen O'Malley commented on HADOOP-1201:
---------------------------------------

But once you have made the progress a separate thread, the ping provides little 
value. The interface will look like:

{code}
boolean updateState(String taskid, int progressCount, float progress, String 
state, TaskStatus.Phase phase, Counters count); 
{code}

I don't see the point of having two threads that are both calling upto the task 
tracker every second, especially since the ping thread is so trivial.

> Progress reporting can be improved for both Map/Reduce tasks
> ------------------------------------------------------------
>
>                 Key: HADOOP-1201
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1201
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Devaraj Das
>
> Both the map and reduce tasks do progress reporting in separate threads. 
> However, in the ReduceTask, after the sort phase, the progress reporting 
> happens inline with the reducer invocations. This slows down the Reduce phase 
> since RPC is involved for every progress report. The better thing to do would 
> be to do progress reporting for all phases in separate threads and have the 
> tasks just update the progress fields.
> One proposal is to extract out the reporting stuff that is there in 
> MapTask/ReduceTask and put it in the Task superclass as a new class, and have 
> methods in the new class that control what/when progress is reported. 
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to