Thanks Friso, We've been trying to diagnose all day and still did not find a solution. We're running cacti and IO wait is down at 0.5%, M&R are tuned right down to 1M 1R on each machine, and the machine CPUs are almost idle with no swap. Using curl to pull a file from a DN comes down at 110m/s.
We are now upping things like epoll Any ideas really greatly appreciated at this stage! Tim On Wed, Nov 17, 2010 at 10:20 AM, Friso van Vollenhoven <fvanvollenho...@xebia.com> wrote: > Hi Tim, > Getting 28K of map outputs to reducers should not take minutes. Reducers on > a properly setup (1Gb) network should be copying at multiple MB/s. I think > you need to get some more info. > Apart from top, you'll probably also want to look at iostat and vmstat. The > first will tell you something about disk utilization and the latter can tell > you whether the machines are using swap or not. This is very important. If > you are over utilizing physical memory on the machines, thing will be slow. > It's even better if you put something in place that allows you to get an > overall view of the resource usage across the cluster. Look at Ganglia > (http://ganglia.sourceforge.net/) or Cacti (http://www.cacti.net/) or > something similar. > Basically a job is either CPU bound, IO bound or network bound. You need to > be able to look at all three to see what the bottleneck is. Also, you can > run into churn when you saturate resources and processes are competing for > them (e.g. when you have two disks and 50 processes / threads reading from > them, things will be slow because the OS needs to switch between them a lot > and overall throughput will be less than what the disks can do; you can see > this when there is a lot of time in iowait, but overall throughput is low so > there's a lot of seeks going on). > > > > On 17 nov 2010, at 09:43, Tim Robertson wrote: > > Hi all, > > We have setup a small cluster (13 nodes) using CDH3 > > We have been tuning it using TeraSort and Hive queries on our data, > and the copy phase is very slow, so I'd like to ask if anyone can look > over our config. > > We have an unbalanced set of machines (all on a single switch): > - 10 of Intel @ 2.83GHz Quad, 8GB, 2x500G 7.2K SATA (3 mappers, 2 reducers) > - 3 of Intel @ 2.53GHz Dual Quad, 24GB, 6x250GB 5.4K SATA (12 > mappers, 12 reducers) > > We monitored the load using $top on machines, to settle on the number > of mappers and reducers to stop overloading them, and the map() and > reduce() is working very nicely - all our time > > The config: > > io.sort.mb=400 > io.sort.factor=100 > mapred.reduce.parallel.copies=20 > tasktracker.http.threads=80 > mapred.compress.map.output=true/false (no notible difference) > mapred.map.output.compression.codec=com.hadoop.compression.lzo.LzoCodec > mapred.output.compression.type=BLOCK > mapred.inmem.merge.threshold=0 > mapred.job.reduce.input.buffer.percent=0.7 > mapred.job.reuse.jvm.num.tasks=50 > > An example job: > (select basis_of_record,count(1) from occurrence_record group by > basis_of_record) > Map input records 262,573,931 finished in 2mins30 using 833 mappers > Reduce was at 24% at 2mins30 finished map with all 55 running > Map output records: 1,855 > Map output bytes: 28,724 > REDUCE COPY PHASE finished after 7mins01 secs > Reduce finished after 7mins17secs > > I am correct that 28,724 bytes emitted from a map should not take 4mins30 > right? > > We're running puppet so can test changes quickly. > > Any pointers on how we can debug / improve this are greatly appreciated! > Tim > >