[
https://issues.apache.org/jira/browse/MAPREDUCE-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shrinivas Joshi updated MAPREDUCE-4381:
---------------------------------------
Attachment: MAPREDUCE-4381-branch-1.patch
> Make PROGRESS_INTERVAL of org.apache.hadoop.mapred.Task a tunable
> -----------------------------------------------------------------
>
> Key: MAPREDUCE-4381
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4381
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: task, tasktracker
> Reporter: Shrinivas Joshi
> Priority: Minor
> Attachments: MAPREDUCE-4381-branch-1.patch, progress_interval.patch
>
>
> Currently PROGRESS_INTERVAL is a hard-coded value and is set to 3000 msec. We
> tried making it a tunable and experimented with different values. In some
> cases setting it to a smaller value like 1000 msec helps significantly
> improve performance of short running jobs such as piEstimator. This is
> because the task threads do not end up blocking for as many as 3 seconds for
> their last progress update event. We also noticed close to 14% improvement on
> Mahout KMeans iteration jobs which take more than 5 minutes on the test
> cluster that we are using. Please let me know if this seems to be a good
> idea. I have an initial patch that I have attached here. This is based on
> branch-1 tree. It may need some rework on MRv2 based branches I think. Also
> note that I have not changed the variable naming style for PROGRESS_INTERVAL
> even though it is not a public static final anymore. I can revise the patch
> if there are no objections to this idea.
> Thanks.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira