[
https://issues.apache.org/jira/browse/MAPREDUCE-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15288083#comment-15288083
]
zhihai xu commented on MAPREDUCE-6696:
--------------------------------------
Thanks for the review [~jianhe]! good finding yes, JobImpl#checkTaskLimits was
the very initial code for the task limit, but this will happen at AM so it
still will waste some resource (AM container). Yes, MRJobConfig.NUM_MAPS is
giving a hint about but my patch is based on InputFormat.getSplits, which will
exactly match the number of mappers of the MapReduce Job:
{code}
LOG.debug("Creating splits at " + jtFs.makeQualified(submitJobDir));
int maps = writeSplits(job, submitJobDir);
conf.setInt(MRJobConfig.NUM_MAPS, maps);
LOG.info("number of splits:" + maps);
{code}
writeSplits will call InputFormat.getSplits.
{code}
/**
* Logically split the set of input files for the job.
*
* <p>Each {@link InputSplit} is then assigned to an individual {@link Mapper}
* for processing.</p>
*
* <p><i>Note</i>: The split is a <i>logical</i> split of the inputs and the
* input files are not physically split into chunks. For e.g. a split could
* be <i><input-file-path, start, offset></i> tuple.
*
* @param job job configuration.
* @param numSplits the desired number of splits, a hint.
* @return an array of {@link InputSplit}s for the job.
*/
InputSplit[] getSplits(JobConf job, int numSplits) throws IOException;
{code}
My patch will reject the job during submission, which can save AM container
resource.
> Add a configuration to limit the number of map tasks allowed per job.
> ---------------------------------------------------------------------
>
> Key: MAPREDUCE-6696
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6696
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: job submission
> Affects Versions: 2.8.0
> Reporter: zhihai xu
> Assignee: zhihai xu
> Attachments: MAPREDUCE-6696.000.patch, MAPREDUCE-6696.001.patch,
> MAPREDUCE-6696.002.patch
>
>
> Add a configuration "mapreduce.job.max.map" to limit the number of map tasks
> allowed per job. It will be useful for Hadoop admin to save Hadoop cluster
> resource by preventing users from submitting big mapreduce jobs. A mapredeuce
> job with too many mappers may fail with OOM after running for long time. It
> will be a big waste.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]