[ 
https://issues.apache.org/jira/browse/HADOOP-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12535894
 ] 

Devaraj Das commented on HADOOP-1900:
-------------------------------------

bq. So, one way to take this into account might be to maintain an average 
time-to-complete for all tasks in the system (of current jobs) and factor that 
into the scaling of the intervals.

The TaskTracker currently pings the JobTracker asking for a task as soon as it 
finishes executing a task. I think that should be the behavior to keep the 
utilization of the tasktrackers optimal (of course, in general we could do 
better by sending it a bunch of tasks every time it asks for a new task, but 
that's the subject of another jira).

bq. Also, while we are at this, I say we should start to consider busy-ness of 
JobTracker too, along with the cluster-size. So, for e.g., if the individual 
tasks are taking in the order of minutes, then it might not matter much if we 
send one every 20s or so, in some cases it might. I know that the sort's map 
tasks take around 40s each... 

I propose a change to the status message in the heartbeat - the tasktracker can 
compare the current task status with the previous one and if it finds the 
status to be the same, it doesn't send the complete status object to the 
JobTracker, but just a flag saying it is a duplicate or something to that 
effect. That will reduce the data per RPC considerably for long running tasks 
whose statuses don't change frequently and also reduce the processing load on 
the JobTracker.

Thoughts?

> the heartbeat and task event queries interval should be set dynamically by 
> the JobTracker
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1900
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1900
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Amareshwari Sri Ramadasu
>
> The JobTracker should scale the intervals that the TaskTrackers use to 
> contact it dynamically, based on how the busy it is and the size of the 
> cluster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to