Shrinivas Joshi created MAPREDUCE-4381:
------------------------------------------

             Summary: Make PROGRESS_INTERVAL of org.apache.hadoop.mapred.Task a 
tunable
                 Key: MAPREDUCE-4381
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4381
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: task, tasktracker
            Reporter: Shrinivas Joshi
            Priority: Minor


Currently PROGRESS_INTERVAL is a hard-coded value and is set to 3000 msec. We 
tried making it a tunable and experimented with different values. In some cases 
setting it to a smaller value like 1000 msec helps significantly improve 
performance of short running jobs such as piEstimator. This is because the task 
threads do not end up blocking for as many as 3 seconds for their last progress 
update event. We also noticed close to 14% improvement on Mahout KMeans 
iteration jobs which take more than 5 minutes on the test cluster that we are 
using. Please let me know if this seems to be a good idea. I have an initial 
patch that I have attached here. This is based on branch-1 tree. It may need 
some rework on MRv2 based branches I think. Also note that I have not changed 
the variable naming style for PROGRESS_INTERVAL even though it is not a public 
static final anymore. I can revise the patch if there are no objections to this 
idea. 
Thanks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to