I was trying to control the maximum number of tasks per tasktracker by using the mapred.tasktracker.tasks.maximum parameter
I am interpreting your comment to mean that maybe this parameter is malformed and should read: mapred.tasktracker.map.tasks.maximum = 8 mapred.tasktracker.map.tasks.maximum = 8 I did that, and reran on a 428MB input, and got the same results as before. I also ran it on a 3.3G dataset, and got the same pattern. I am still trying to run it on a 20 GB input. This should confirm if the filesystem cache thing is true. -SM On Thu, Mar 5, 2009 at 12:22 PM, Sandy <snickerdoodl...@gmail.com> wrote: > Arun, > > How can I check the number of slots per tasktracker? Which parameter > controls that? > > Thanks, > -SM > > > On Thu, Mar 5, 2009 at 12:14 PM, Arun C Murthy <a...@yahoo-inc.com> wrote: > >> I assume you have only 2 map and 2 reduce slots per tasktracker - which >> totals to 2 maps/reduces for you cluster. This means with more maps/reduces >> they are serialized to 2 at a time. >> >> Also, the -m is only a hint to the JobTracker, you might see less/more >> than the number of maps you have specified on the command line. >> The -r however is followed faithfully. >> >> Arun >> >> >> On Mar 4, 2009, at 2:46 PM, Sandy wrote: >> >> Hello all, >>> >>> For the sake of benchmarking, I ran the standard hadoop wordcount example >>> on >>> an input file using 2, 4, and 8 mappers and reducers for my job. >>> In other words, I do: >>> >>> time -p bin/hadoop jar hadoop-0.18.3-examples.jar wordcount -m 2 -r 2 >>> sample.txt output >>> time -p bin/hadoop jar hadoop-0.18.3-examples.jar wordcount -m 4 -r 4 >>> sample.txt output2 >>> time -p bin/hadoop jar hadoop-0.18.3-examples.jar wordcount -m 8 -r 8 >>> sample.txt output3 >>> >>> Strangely enough, when this increase in mappers and reducers result in >>> slower running times! >>> -On 2 mappers and reducers it ran for 40 seconds >>> on 4 mappers and reducers it ran for 60 seconds >>> on 8 mappers and reducers it ran for 90 seconds! >>> >>> Please note that the "sample.txt" file is identical in each of these >>> runs. >>> >>> I have the following questions: >>> - Shouldn't wordcount get -faster- with additional mappers and reducers, >>> instead of slower? >>> - If it does get faster for other people, why does it become slower for >>> me? >>> I am running hadoop on psuedo-distributed mode on a single 64-bit Mac >>> Pro >>> with 2 quad-core processors, 16 GB of RAM and 4 1TB HDs >>> >>> I would greatly appreciate it if someone could explain this behavior to >>> me, >>> and tell me if I'm running this wrong. How can I change my settings (if >>> at >>> all) to get wordcount running faster when i increases that number of maps >>> and reduces? >>> >>> Thanks, >>> -SM >>> >> >> >