Try to split your sample.txt into multi files. and try it again. For text input format , the number of task is equals to the input size.
2009/3/6 Sandy <snickerdoodl...@gmail.com> > I used three different sample.txt files, and was able to replicate the > error. The first was 1.5MB, the second 66MB, and the last 428MB. I get the > same problem despite what size of input file I use: the running time of > wordcount increases with the number of mappers and reducers specified. If > it > is the problem of the input file, how big do I have to go before it > disappears entirely? > > If it is psuedo-distributed mode that's the issue, what mode should I be > running on my machine, given it's specs? Once again, it is a SINGLE MacPro > with 16GB of RAM, 4 1TB hard disks, and 2 quad-core processors. > > I'm not sure if it's HADOOP-2771, since the sort/merge(shuffle) is what > seems to be taking the longest: > 2 M/R ==> map: 18 sec, shuffle: 15 sec, reduce: 9 sec > 4 M/R ==> map: 19 sec, shuffle: 37 sec, reduce: 2 sec > 8 M/R ==> map: 21 sec, shuffle: 1 min 10 sec, 1 sec > > To make sure it's not because of the combiner, I removed it and reran > everything again, and got the same bottom-line: With increasing maps and > reducers, running time goes up, with majority of time seeming to be in > sort/merge. > > Also, another thing we noticed is that the CPUs seem to be very active > during the map phase, but when the map phase reaches 100%, and only reduce > appears to be running, the CPUs all become idle. Furthermore, despite the > number of mappers I specify, all the CPUs become very active when a job is > running. Why is this so? If I specify 2 mappers and 2 reducers, won't there > be just 2 or 4 CPUs that should be active? Why are all 8 active? > > Since I can reproduce this error using Hadoop's standard word count > example, > I was hoping that someone else could tell me if they can reproduce this > too. > Is it true that when you increase the number of mappers and reducers on > your > systems, the running time of wordcount goes up? > > Thanks for the help! I'm looking forward to your responses. > > -SM > > On Thu, Mar 5, 2009 at 2:57 AM, Amareshwari Sriramadasu < > amar...@yahoo-inc.com> wrote: > > > Are you hitting HADOOP-2771? > > -Amareshwari > > > > Sandy wrote: > > > >> Hello all, > >> > >> For the sake of benchmarking, I ran the standard hadoop wordcount > example > >> on > >> an input file using 2, 4, and 8 mappers and reducers for my job. > >> In other words, I do: > >> > >> time -p bin/hadoop jar hadoop-0.18.3-examples.jar wordcount -m 2 -r 2 > >> sample.txt output > >> time -p bin/hadoop jar hadoop-0.18.3-examples.jar wordcount -m 4 -r 4 > >> sample.txt output2 > >> time -p bin/hadoop jar hadoop-0.18.3-examples.jar wordcount -m 8 -r 8 > >> sample.txt output3 > >> > >> Strangely enough, when this increase in mappers and reducers result in > >> slower running times! > >> -On 2 mappers and reducers it ran for 40 seconds > >> on 4 mappers and reducers it ran for 60 seconds > >> on 8 mappers and reducers it ran for 90 seconds! > >> > >> Please note that the "sample.txt" file is identical in each of these > runs. > >> > >> I have the following questions: > >> - Shouldn't wordcount get -faster- with additional mappers and reducers, > >> instead of slower? > >> - If it does get faster for other people, why does it become slower for > >> me? > >> I am running hadoop on psuedo-distributed mode on a single 64-bit Mac > Pro > >> with 2 quad-core processors, 16 GB of RAM and 4 1TB HDs > >> > >> I would greatly appreciate it if someone could explain this behavior to > >> me, > >> and tell me if I'm running this wrong. How can I change my settings (if > at > >> all) to get wordcount running faster when i increases that number of > maps > >> and reduces? > >> > >> Thanks, > >> -SM > >> > >> > >> > > > > > -- http://daily.appspot.com/food/