Hi Allen, I have set them both to 10 in the mapred-site.xml file.
Cheers, Teryl On Tue, Jan 19, 2010 at 4:32 PM, Allen Wittenauer <awittena...@linkedin.com>wrote: > What is the value of: > > mapred.tasktracker.map.tasks.maximum > mapred.tasktracker.reduce.tasks.maximum > > > On 1/19/10 10:23 AM, "Teryl Taylor" <teryl.tay...@gmail.com> wrote: > > > Hi guys, > > > > Thanks for the answers. Michael, yes you are right, that is what I > guess, > > I'm looking for...how to reduce the number of mappers running > > simultaneously. The system is running really slow and I think it might > be > > due to constant thread context switching because of so many Mappers > running > > concurrently. Is there a way to tell how many Mappers are running at > the > > same time? My concern is that even though I set mapred.map.tasks to 10, > > the job configuration file (i.e. the job.xml file that is generated to > the > > logs directory) always says mapred.map.tasks is 1083 which makes me > believe > > it is completely ignoring my setting. This is confirmed by the snippet > of > > code from the JobClient.java file in which the client seems to ask for > the > > number of input splits and automatically sets mapred.map.tasks to that > > number...totally ignoring my setting. > > > > Cheers, > > > > Teryl > > > > > > > > On Tue, Jan 19, 2010 at 12:14 PM, Clements, Michael < > > michael.cleme...@disney.com> wrote: > > > >> Do you want to change the total # of mappers, or the # that run at any > >> given time? For example, Hadoop may split your job into 1083 mapper > tasks, > >> only 10 of which it allows to run at a time. > >> > >> > >> > >> The setting in mapred-site.xml caps how many mappers can run > simultaneously > >> per machine. It does not affect how many total mappers will make up the > job. > >> > >> > >> > >> Total # of tasks in a job, and the # that can run simultaneously, are > two > >> separate settings. Both are important for tuning performance. The > >> InputFormat controls the first. > >> > >> > >> > >> > >> > >> *From:* mapreduce-user-return-292-Michael.Clements=disney.com@ > >> hadoop.apache.org [mailto:mapreduce-user-return-292-Michael.Clements= > >> disney....@hadoop.apache.org] *On Behalf Of *Jeff Zhang > >> *Sent:* Monday, January 18, 2010 4:54 PM > >> *To:* mapreduce-user@hadoop.apache.org > >> *Subject:* Re: Question about setting the number of mappers. > >> > >> > >> > >> > >> Hi Teryl > >> > >> The number of mapper is determined by the InputFormat you use, in your > >> case, one way is to merge the files to large file beforehand, or use the > >> CombineFileInputFormat as your InputFormat. > >> > >> > >> > >> On Mon, Jan 18, 2010 at 1:05 PM, Teryl Taylor <teryl.tay...@gmail.com> > >> wrote: > >> > >> Hi everyone, > >> > >> I'm playing around with the Hadoop map reduce library and I'm getting > some > >> odd behaviour. The system is setup on one machine using the pseudo > >> distributed configuration. I use KFS as my DFS. I have written a > >> MapReduce program to process a bunch of binary files. The files are > >> compressed in chunks, so I do not split the files. There are 1083 files > >> that I am loading into the map reducer. > >> > >> Everytime I run the map reducer: > >> > >> ~/hadoop/bin/hadoop jar /home/hadoop/lib/test.jar > >> org.apache.hadoop.mapreduce.apps.Test /input/2007/*/*/ /output > >> > >> It always creates 1083 mapper tasks to do the processing..which is > >> extremely slow....so I wanted to try and lower the number to 10 and see > how > >> the performance is. I set the following in mapred-site.xml: > >> > >> <configuration> > >> <property> > >> <name>mapred.job.tracker</name> > >> <value>localhost:9001</value> > >> </property> > >> <property> > >> <name>mapred.tasktracker.map.tasks.maximum</name> > >> <value>10</value> > >> </property> > >> <property> > >> <name>mapred.tasktracker.reduce.tasks.maximum</name> > >> <value>10</value> > >> </property> > >> <property> > >> <name>mapred.map.tasks</name> > >> <value>10</value> > >> </property> > >> </configuration> > >> > >> Have recycled the jobtracker and tasktracker and I still always get 1083 > >> mappers. The map reducer is working as expected other than this. I'm > >> using the new API and my main function in my class looks like: > >> > >> public static void main(String[] args) throws Exception { > >> Configuration conf = new Configuration(); > >> String[] otherArgs = new GenericOptionsParser(conf, > >> args).getRemainingArgs(); > >> if (otherArgs.length != 2) { > >> System.err.println("Usage: wordcount <in> <out>"); > >> System.exit(2); > >> } > >> // conf.setInt("mapred.map.tasks", 10); > >> Job job = new Job(conf, "word count"); > >> conf.setNumMapTasks(10); > >> job.setJarByClass(Test.class); > >> job.setMapperClass(TestMapper.class); > >> job.setCombinerClass(IntSumReducer.class); > >> job.setReducerClass(IntSumReducer.class); > >> job.setOutputKeyClass(Text.class); > >> job.setOutputValueClass(IntWritable.class); > >> job.setInputFormatClass(CustomInputFormat.class); > >> job.setOutputFormatClass(TextOutputFormat.class); > >> FileInputFormat.addInputPath(job, new Path(otherArgs[0])); > >> FileOutputFormat.setOutputPath(job, new Path(otherArgs[1])); > >> System.exit(job.waitForCompletion(true) ? 0 : 1); > >> } > >> > >> I have been digging around in the hadoop source code and it looks like > the > >> JobClient actually sets the mappers to the number of splits (hard > >> coded)....snippet from the JobClient class: > >> > >> > ***************************************************************************** > >> > ***************************************************************************** > >> > > *****************************************************************************>> > * > >> /** > >> * Internal method for submitting jobs to the system. > >> * @param job the configuration to submit > >> * @return a proxy object for the running job > >> * @throws FileNotFoundException > >> * @throws ClassNotFoundException > >> * @throws InterruptedException > >> * @throws IOException > >> */ > >> public > >> RunningJob submitJobInternal(JobConf job > >> ) throws FileNotFoundException, > >> ClassNotFoundException, > >> InterruptedException, > >> IOException { > >> /* > >> * configure the command line options correctly on the submitting > dfs > >> */ > >> > >> JobID jobId = jobSubmitClient.getNewJobId(); > >> Path submitJobDir = new Path(getSystemDir(), jobId.toString()); > >> Path submitJarFile = new Path(submitJobDir, "job.jar"); > >> Path submitSplitFile = new Path(submitJobDir, "job.split"); > >> configureCommandLineOptions(job, submitJobDir, submitJarFile); > >> Path submitJobFile = new Path(submitJobDir, "job.xml"); > >> int reduces = job.getNumReduceTasks(); > >> JobContext context = new JobContext(job, jobId); > >> > >> // Check the output specification > >> if (reduces == 0 ? job.getUseNewMapper() : job.getUseNewReducer()) { > >> org.apache.hadoop.mapreduce.OutputFormat<?,?> output = > >> ReflectionUtils.newInstance(context.getOutputFormatClass(), > job); > >> output.checkOutputSpecs(context); > >> } else { > >> job.getOutputFormat().checkOutputSpecs(fs, job); > >> } > >> > >> // Create the splits for the job > >> LOG.debug("Creating splits at " + > fs.makeQualified(submitSplitFile)); > >> int maps; > >> if (job.getUseNewMapper()) { > >> maps = writeNewSplits(context, submitSplitFile); > >> } else { > >> maps = writeOldSplits(job, submitSplitFile); > >> } > >> job.set("mapred.job.split.file", submitSplitFile.toString()); > >> job.setNumMapTasks(maps); > >> > >> // Write job file to JobTracker's fs > >> FSDataOutputStream out = > >> FileSystem.create(fs, submitJobFile, > >> new FsPermission(JOB_FILE_PERMISSION)); > >> > >> try { > >> job.writeXml(out); > >> } finally { > >> out.close(); > >> ..... > >> > >> 737,0-1 39% > >> } > >> > >> > >> > ***************************************************************************** > >> > ***************************************************************************** > >> > ***************************************************************************** > >> ******** > >> > >> Is there anything I can do to get the number of mappers to be more > >> flexible? > >> > >> > >> Cheers, > >> > >> Teryl > >> > >> > >> > >> > >> -- > >> Best Regards > >> > >> Jeff Zhang > >> > >