This means that -every node- can have 10 map tasks and 10 reduce tasks running simultaneously. If you want to limit how many tasks run -per node- in order to change the performance profile, then you need to lower this.
10/10 is generally way way way too high. Rule of thumb says you set both of these based on # of cores. So an 8 core box would do 4/4. Start with that number then start profiling the system usage and go up/down based upon what you see. On 1/19/10 12:59 PM, "Teryl Taylor" <teryl.tay...@gmail.com> wrote: > Hi Allen, > > I have set them both to 10 in the mapred-site.xml file. > > Cheers, > > Teryl > > > On Tue, Jan 19, 2010 at 4:32 PM, Allen Wittenauer > <awittena...@linkedin.com>wrote: > >> What is the value of: >> >> mapred.tasktracker.map.tasks.maximum >> mapred.tasktracker.reduce.tasks.maximum >> >> >> On 1/19/10 10:23 AM, "Teryl Taylor" <teryl.tay...@gmail.com> wrote: >> >>> Hi guys, >>> >>> Thanks for the answers. Michael, yes you are right, that is what I >> guess, >>> I'm looking for...how to reduce the number of mappers running >>> simultaneously. The system is running really slow and I think it might >> be >>> due to constant thread context switching because of so many Mappers >> running >>> concurrently. Is there a way to tell how many Mappers are running at >> the >>> same time? My concern is that even though I set mapred.map.tasks to 10, >>> the job configuration file (i.e. the job.xml file that is generated to >> the >>> logs directory) always says mapred.map.tasks is 1083 which makes me >> believe >>> it is completely ignoring my setting. This is confirmed by the snippet >> of >>> code from the JobClient.java file in which the client seems to ask for >> the >>> number of input splits and automatically sets mapred.map.tasks to that >>> number...totally ignoring my setting. >>> >>> Cheers, >>> >>> Teryl >>> >>> >>> >>> On Tue, Jan 19, 2010 at 12:14 PM, Clements, Michael < >>> michael.cleme...@disney.com> wrote: >>> >>>> Do you want to change the total # of mappers, or the # that run at any >>>> given time? For example, Hadoop may split your job into 1083 mapper >> tasks, >>>> only 10 of which it allows to run at a time. >>>> >>>> >>>> >>>> The setting in mapred-site.xml caps how many mappers can run >> simultaneously >>>> per machine. It does not affect how many total mappers will make up the >> job. >>>> >>>> >>>> >>>> Total # of tasks in a job, and the # that can run simultaneously, are >> two >>>> separate settings. Both are important for tuning performance. The >>>> InputFormat controls the first. >>>> >>>> >>>> >>>> >>>> >>>> *From:* mapreduce-user-return-292-Michael.Clements=disney.com@ >>>> hadoop.apache.org [mailto:mapreduce-user-return-292-Michael.Clements= >>>> disney....@hadoop.apache.org] *On Behalf Of *Jeff Zhang >>>> *Sent:* Monday, January 18, 2010 4:54 PM >>>> *To:* mapreduce-user@hadoop.apache.org >>>> *Subject:* Re: Question about setting the number of mappers. >>>> >>>> >>>> >>>> >>>> Hi Teryl >>>> >>>> The number of mapper is determined by the InputFormat you use, in your >>>> case, one way is to merge the files to large file beforehand, or use the >>>> CombineFileInputFormat as your InputFormat. >>>> >>>> >>>> >>>> On Mon, Jan 18, 2010 at 1:05 PM, Teryl Taylor <teryl.tay...@gmail.com> >>>> wrote: >>>> >>>> Hi everyone, >>>> >>>> I'm playing around with the Hadoop map reduce library and I'm getting >> some >>>> odd behaviour. The system is setup on one machine using the pseudo >>>> distributed configuration. I use KFS as my DFS. I have written a >>>> MapReduce program to process a bunch of binary files. The files are >>>> compressed in chunks, so I do not split the files. There are 1083 files >>>> that I am loading into the map reducer. >>>> >>>> Everytime I run the map reducer: >>>> >>>> ~/hadoop/bin/hadoop jar /home/hadoop/lib/test.jar >>>> org.apache.hadoop.mapreduce.apps.Test /input/2007/*/*/ /output >>>> >>>> It always creates 1083 mapper tasks to do the processing..which is >>>> extremely slow....so I wanted to try and lower the number to 10 and see >> how >>>> the performance is. I set the following in mapred-site.xml: >>>> >>>> <configuration> >>>> <property> >>>> <name>mapred.job.tracker</name> >>>> <value>localhost:9001</value> >>>> </property> >>>> <property> >>>> <name>mapred.tasktracker.map.tasks.maximum</name> >>>> <value>10</value> >>>> </property> >>>> <property> >>>> <name>mapred.tasktracker.reduce.tasks.maximum</name> >>>> <value>10</value> >>>> </property> >>>> <property> >>>> <name>mapred.map.tasks</name> >>>> <value>10</value> >>>> </property> >>>> </configuration> >>>> >>>> Have recycled the jobtracker and tasktracker and I still always get 1083 >>>> mappers. The map reducer is working as expected other than this. I'm >>>> using the new API and my main function in my class looks like: >>>> >>>> public static void main(String[] args) throws Exception { >>>> Configuration conf = new Configuration(); >>>> String[] otherArgs = new GenericOptionsParser(conf, >>>> args).getRemainingArgs(); >>>> if (otherArgs.length != 2) { >>>> System.err.println("Usage: wordcount <in> <out>"); >>>> System.exit(2); >>>> } >>>> // conf.setInt("mapred.map.tasks", 10); >>>> Job job = new Job(conf, "word count"); >>>> conf.setNumMapTasks(10); >>>> job.setJarByClass(Test.class); >>>> job.setMapperClass(TestMapper.class); >>>> job.setCombinerClass(IntSumReducer.class); >>>> job.setReducerClass(IntSumReducer.class); >>>> job.setOutputKeyClass(Text.class); >>>> job.setOutputValueClass(IntWritable.class); >>>> job.setInputFormatClass(CustomInputFormat.class); >>>> job.setOutputFormatClass(TextOutputFormat.class); >>>> FileInputFormat.addInputPath(job, new Path(otherArgs[0])); >>>> FileOutputFormat.setOutputPath(job, new Path(otherArgs[1])); >>>> System.exit(job.waitForCompletion(true) ? 0 : 1); >>>> } >>>> >>>> I have been digging around in the hadoop source code and it looks like >> the >>>> JobClient actually sets the mappers to the number of splits (hard >>>> coded)....snippet from the JobClient class: >>>> >>>> >> ***************************************************************************** >>>> >> ***************************************************************************** >>>> >> >> ***************************************************************************** >> >> >> * >>>> /** >>>> * Internal method for submitting jobs to the system. >>>> * @param job the configuration to submit >>>> * @return a proxy object for the running job >>>> * @throws FileNotFoundException >>>> * @throws ClassNotFoundException >>>> * @throws InterruptedException >>>> * @throws IOException >>>> */ >>>> public >>>> RunningJob submitJobInternal(JobConf job >>>> ) throws FileNotFoundException, >>>> ClassNotFoundException, >>>> InterruptedException, >>>> IOException { >>>> /* >>>> * configure the command line options correctly on the submitting >> dfs >>>> */ >>>> >>>> JobID jobId = jobSubmitClient.getNewJobId(); >>>> Path submitJobDir = new Path(getSystemDir(), jobId.toString()); >>>> Path submitJarFile = new Path(submitJobDir, "job.jar"); >>>> Path submitSplitFile = new Path(submitJobDir, "job.split"); >>>> configureCommandLineOptions(job, submitJobDir, submitJarFile); >>>> Path submitJobFile = new Path(submitJobDir, "job.xml"); >>>> int reduces = job.getNumReduceTasks(); >>>> JobContext context = new JobContext(job, jobId); >>>> >>>> // Check the output specification >>>> if (reduces == 0 ? job.getUseNewMapper() : job.getUseNewReducer()) { >>>> org.apache.hadoop.mapreduce.OutputFormat<?,?> output = >>>> ReflectionUtils.newInstance(context.getOutputFormatClass(), >> job); >>>> output.checkOutputSpecs(context); >>>> } else { >>>> job.getOutputFormat().checkOutputSpecs(fs, job); >>>> } >>>> >>>> // Create the splits for the job >>>> LOG.debug("Creating splits at " + >> fs.makeQualified(submitSplitFile)); >>>> int maps; >>>> if (job.getUseNewMapper()) { >>>> maps = writeNewSplits(context, submitSplitFile); >>>> } else { >>>> maps = writeOldSplits(job, submitSplitFile); >>>> } >>>> job.set("mapred.job.split.file", submitSplitFile.toString()); >>>> job.setNumMapTasks(maps); >>>> >>>> // Write job file to JobTracker's fs >>>> FSDataOutputStream out = >>>> FileSystem.create(fs, submitJobFile, >>>> new FsPermission(JOB_FILE_PERMISSION)); >>>> >>>> try { >>>> job.writeXml(out); >>>> } finally { >>>> out.close(); >>>> ..... >>>> >>>> 737,0-1 39% >>>> } >>>> >>>> >>>> >> ***************************************************************************** >>>> >> ***************************************************************************** >>>> >> ***************************************************************************** >>>> ******** >>>> >>>> Is there anything I can do to get the number of mappers to be more >>>> flexible? >>>> >>>> >>>> Cheers, >>>> >>>> Teryl >>>> >>>> >>>> >>>> >>>> -- >>>> Best Regards >>>> >>>> Jeff Zhang >>>> >> >>