Re: wordcount getting slower with more mappers and reducers?

Nick Cen Thu, 05 Mar 2009 08:38:10 -0800

Try to split your sample.txt into multi files.  and try it again.
For text input format , the number of task is equals to the input size.



2009/3/6 Sandy <snickerdoodl...@gmail.com>

> I used three different sample.txt files, and was able to replicate the
> error. The first was 1.5MB, the second 66MB, and the last 428MB. I get the
> same problem despite what size of input file I use: the running time of
> wordcount increases with the number of mappers and reducers specified. If
> it
> is the problem of the input file, how big do I have to go before it
> disappears entirely?
>
> If it is psuedo-distributed mode that's the issue, what mode should I be
> running on my machine, given it's specs? Once again, it is a SINGLE MacPro
> with 16GB of RAM, 4  1TB hard disks, and 2 quad-core processors.
>
> I'm not sure if it's HADOOP-2771, since the sort/merge(shuffle) is what
> seems to be taking the longest:
> 2 M/R ==> map: 18 sec, shuffle: 15 sec, reduce: 9 sec
> 4 M/R ==> map: 19 sec, shuffle: 37 sec, reduce: 2 sec
> 8 M/R ==> map: 21 sec, shuffle: 1 min 10 sec, 1 sec
>
> To make sure it's not because of the combiner, I removed it and reran
> everything again, and got the same bottom-line: With increasing maps and
> reducers, running time goes up, with majority of time seeming to be in
> sort/merge.
>
> Also, another thing we noticed is that the CPUs seem to be very active
> during the map phase, but when the map phase reaches 100%, and only reduce
> appears to be running, the CPUs all become idle. Furthermore, despite the
> number of mappers I specify, all the CPUs become very active when a job is
> running. Why is this so? If I specify 2 mappers and 2 reducers, won't there
> be just 2 or 4 CPUs that should be active? Why are all 8 active?
>
> Since I can reproduce this error using Hadoop's standard word count
> example,
> I was hoping that someone else could tell me if they can reproduce this
> too.
> Is it true that when you increase the number of mappers and reducers on
> your
> systems, the running time of wordcount goes up?
>
> Thanks for the help! I'm looking forward to your responses.
>
> -SM
>
> On Thu, Mar 5, 2009 at 2:57 AM, Amareshwari Sriramadasu <
> amar...@yahoo-inc.com> wrote:
>
> > Are you hitting HADOOP-2771?
> > -Amareshwari
> >
> > Sandy wrote:
> >
> >> Hello all,
> >>
> >> For the sake of benchmarking, I ran the standard hadoop wordcount
> example
> >> on
> >> an input file using 2, 4, and 8 mappers and reducers for my job.
> >> In other words,  I do:
> >>
> >> time -p bin/hadoop jar hadoop-0.18.3-examples.jar wordcount -m 2 -r 2
> >> sample.txt output
> >> time -p bin/hadoop jar hadoop-0.18.3-examples.jar wordcount -m 4 -r 4
> >> sample.txt output2
> >> time -p bin/hadoop jar hadoop-0.18.3-examples.jar wordcount -m 8 -r 8
> >> sample.txt output3
> >>
> >> Strangely enough, when this increase in mappers and reducers result in
> >> slower running times!
> >> -On 2 mappers and reducers it ran for 40 seconds
> >> on 4 mappers and reducers it ran for 60 seconds
> >> on 8 mappers and reducers it ran for 90 seconds!
> >>
> >> Please note that the "sample.txt" file is identical in each of these
> runs.
> >>
> >> I have the following questions:
> >> - Shouldn't wordcount get -faster- with additional mappers and reducers,
> >> instead of slower?
> >> - If it does get faster for other people, why does it become slower for
> >> me?
> >>  I am running hadoop on psuedo-distributed mode on a single 64-bit Mac
> Pro
> >> with 2 quad-core processors, 16 GB of RAM and 4 1TB HDs
> >>
> >> I would greatly appreciate it if someone could explain this behavior to
> >> me,
> >> and tell me if I'm running this wrong. How can I change my settings (if
> at
> >> all) to get wordcount running faster when i increases that number of
> maps
> >> and reduces?
> >>
> >> Thanks,
> >> -SM
> >>
> >>
> >>
> >
> >
>



-- 
http://daily.appspot.com/food/

Re: wordcount getting slower with more mappers and reducers?

Reply via email to