[
https://issues.apache.org/jira/browse/MAPREDUCE-2634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058228#comment-13058228
]
amit hadke commented on MAPREDUCE-2634:
---------------------------------------
Reg Proposal#2 JobTracker can probably run job setup/cleanup if job is using
default output committer (FileOutputCommiter - either old(mapred) or new
(mapreduce) api)
> MapReduce Performance Improvements using forced heartbeat
> ----------------------------------------------------------
>
> Key: MAPREDUCE-2634
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2634
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Abhijit Suresh Shingate
> Priority: Minor
> Original Estimate: 168h
> Remaining Estimate: 168h
>
> Following are the proposals which would cause some performance optimizations
> over MapReduce
> *1.Notify TaskTracker to send heartbeat when a new Job is submitted*
> a) Presently when new Job is submitted to JobTracker, the tasks are
> assigned to TaskTracker only when the TaskTracker sends heartbeat to
> JobTracker
> b) Proposal:
> - JobTracker will notify all TaskTrackers to send heartbeat to
> JobTracker whenever a new Job is submitted to JobTracker. So that the Tasks
> of the new Job can be immediately assigned to all TaskTrackers.
> *2. Execute Job Setup and Cleanup on JobTracker JVM*
> a) Presently Job Setup and Cleanup is carried out as a separated task on
> TaskTracker
> b) Launching a new JVM for Setup and Cleanup of the Job introduces some
> amount of overhead. It takes generally about 0.7 - 1.5 seconds.
> c) Proposal:
> - JobTracker will execute the Job Setup and Cleanup tasks on the
> JobTracker JVM only.
> *3. Request TaskTracker to send heartbeat when the Map Task is completed.*
> a) Presently TaskTracker reports status of completed Map Tasks as part of
> heartbeat at a regular interval.
> b) Proposal:
> - Map Task requests TaskTracker to send heartbeat to JobTracker when
> Map Task is completed. So that Reduce task can quickly know which map task is
> finished and copy map outputs to local.
> *4. Request JobTracker to trigger committing of Reduce output when Reduce
> Task has finished.*
> a) Presently JobTracker will ask the Reduce Task to commit its output to
> HDFS through heartbeat response.
> b) Proposal:
> - Reduce Task requests TaskTracker to send heartbeat to JobTracker
> whenever Reduce Task is completed.
> These optimizations might work on small clusters but on big clusters it may
> be overhead.
> Please let us know your views.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira