[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15288083#comment-15288083
 ] 

zhihai xu commented on MAPREDUCE-6696:
--------------------------------------

Thanks for the review [~jianhe]! good finding yes, JobImpl#checkTaskLimits was 
the very initial code for the task limit, but this will happen at AM so it 
still will waste some resource (AM container). Yes, MRJobConfig.NUM_MAPS is 
giving a hint about but my patch is based on InputFormat.getSplits, which will 
exactly match the number of mappers of the MapReduce Job:
{code}
       LOG.debug("Creating splits at " + jtFs.makeQualified(submitJobDir));
      int maps = writeSplits(job, submitJobDir);
      conf.setInt(MRJobConfig.NUM_MAPS, maps);
      LOG.info("number of splits:" + maps);
{code} 
writeSplits will call InputFormat.getSplits. 
{code}
   /** 
   * Logically split the set of input files for the job.  
   * 
   * <p>Each {@link InputSplit} is then assigned to an individual {@link Mapper}
   * for processing.</p>
   *
   * <p><i>Note</i>: The split is a <i>logical</i> split of the inputs and the
   * input files are not physically split into chunks. For e.g. a split could
   * be <i>&lt;input-file-path, start, offset&gt;</i> tuple.
   * 
   * @param job job configuration.
   * @param numSplits the desired number of splits, a hint.
   * @return an array of {@link InputSplit}s for the job.
   */
  InputSplit[] getSplits(JobConf job, int numSplits) throws IOException;
{code}
My patch will reject the job during submission, which can save AM container 
resource.

> Add a configuration to limit the number of map tasks allowed per job.
> ---------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6696
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: job submission
>    Affects Versions: 2.8.0
>            Reporter: zhihai xu
>            Assignee: zhihai xu
>         Attachments: MAPREDUCE-6696.000.patch, MAPREDUCE-6696.001.patch, 
> MAPREDUCE-6696.002.patch
>
>
> Add a configuration "mapreduce.job.max.map" to limit the number of map tasks 
> allowed per job. It will be useful for Hadoop admin to save Hadoop cluster 
> resource by preventing users from submitting big mapreduce jobs. A mapredeuce 
> job with too many mappers may fail with OOM after running for long time. It 
> will be a big waste.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to