[
https://issues.apache.org/jira/browse/MAPREDUCE-2634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058203#comment-13058203
]
Todd Lipcon commented on MAPREDUCE-2634:
----------------------------------------
Proposal #1 seems like an interested idea, but I'm skeptical that it will make
a big difference, since we've already lowered the minimum heartbeat interval to
300ms in MAPREDUCE-1906.
Proposal #2 seems scary since setup and cleanup may run user code, and running
user code in the JobTracker JVM is insecure. Piggybacking those with other map
tasks, though, is probably a good idea (for some reason I don't think we do
this with JVM reuse today)
Your proposal #3 and #4 is already implemented by MAPREDUCE-270 if I understand
you correctly.
> MapReduce Performance Improvements using forced heartbeat
> ----------------------------------------------------------
>
> Key: MAPREDUCE-2634
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2634
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Abhijit Suresh Shingate
> Priority: Minor
> Original Estimate: 168h
> Remaining Estimate: 168h
>
> Following are the proposals which would cause some performance optimizations
> over MapReduce
> *1.Notify TaskTracker to send heartbeat when a new Job is submitted*
> a) Presently when new Job is submitted to JobTracker, the tasks are
> assigned to TaskTracker only when the TaskTracker sends heartbeat to
> JobTracker
> b) Proposal:
> - JobTracker will notify all TaskTrackers to send heartbeat to
> JobTracker whenever a new Job is submitted to JobTracker. So that the Tasks
> of the new Job can be immediately assigned to all TaskTrackers.
> *2. Execute Job Setup and Cleanup on JobTracker JVM*
> a) Presently Job Setup and Cleanup is carried out as a separated task on
> TaskTracker
> b) Launching a new JVM for Setup and Cleanup of the Job introduces some
> amount of overhead. It takes generally about 0.7 - 1.5 seconds.
> c) Proposal:
> - JobTracker will execute the Job Setup and Cleanup tasks on the
> JobTracker JVM only.
> *3. Request TaskTracker to send heartbeat when the Map Task is completed.*
> a) Presently TaskTracker reports status of completed Map Tasks as part of
> heartbeat at a regular interval.
> b) Proposal:
> - Map Task requests TaskTracker to send heartbeat to JobTracker when
> Map Task is completed. So that Reduce task can quickly know which map task is
> finished and copy map outputs to local.
> *4. Request JobTracker to trigger committing of Reduce output when Reduce
> Task has finished.*
> a) Presently JobTracker will ask the Reduce Task to commit its output to
> HDFS through heartbeat response.
> b) Proposal:
> - Reduce Task requests TaskTracker to send heartbeat to JobTracker
> whenever Reduce Task is completed.
> These optimizations might work on small clusters but on big clusters it may
> be overhead.
> Please let us know your views.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira