[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884354#action_12884354
 ] 

Scott Carey commented on MAPREDUCE-1906:
----------------------------------------

JobTracker.java has this code:

(0.21 branch, line 2497)
{code}
 public int getNextHeartbeatInterval() {
        // get the no of task trackers
        int clusterSize = getClusterStatus().getTaskTrackers();
        int heartbeatInterval = Math.max(
        (int)(1000 * HEARTBEATS_SCALING_FACTOR *
        Math.ceil((double)clusterSize /
        NUM_HEARTBEATS_IN_SECOND)),
        HEARTBEAT_INTERVAL_MIN) ;
        return heartbeatInterval;
} 
{code}

HEARTBEAT_INTERVAL_MIN is 3000 (milliseconds).  This means that only after a 
cluster has reached 300 nodes does the jobtracker get 100 heartbeats / second.

This throttle is far too large in my experinence.  I have a development cluster 
with 10 nodes, each node can handle 10 maps and 10 reduces concurrently.  With 
0.20, the most the scheduler will do is one map and one reduce per heartbeat.  
The result is an always underutilized cluster whenever there are anything but 
very large jobs running.  Much of our data flows start out large, then end with 
a couple dozen smaller jobs that are mostly chained together.

I have been running in production and development with a patch to 
MRConstants.java that improves cluster utilization significantly by changing 
HEARTBEAT_INTERVAL_MIN to to 300 ms.  In small clusters, a heartbeat every 
300ms is not an issue.  The above code already throttles the system, the floor 
of 3000ms is too large.  It still takes a cluster of 30 machines to get to the 
100 heartbeat/sec threshold.

I also could not find an explanation why this was increased from 2000 to 3000 
between 0.19 and 0.20.  



I

> Lower minimum heartbeat interval for tasktracker > Jobtracker
> -------------------------------------------------------------
>
>                 Key: MAPREDUCE-1906
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1906
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.20.1, 0.20.2
>            Reporter: Scott Carey
>
> I get a 0% to 15% performance increase for smaller clusters by making the 
> heartbeat throttle stop penalizing clusters with less than 300 nodes.
> Between 0.19 and 0.20, the default minimum heartbeat interval increased from 
> 2s to 3s.   If a JobTracker is throttled at 100 heartbeats / sec for large 
> clusters, why should a cluster with 10 nodes be throttled to 3.3 heartbeats 
> per second?  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to