Tim, Are there issues with DNS caching (or lack thereof), misconfigured /etc/hosts, or other network-config gotchas that might be preventing network connections between hosts from opening efficiently?
- Aaron On Wed, Nov 17, 2010 at 12:50 PM, Tim Robertson <timrobertson...@gmail.com>wrote: > Thanks Friso, > > We've been trying to diagnose all day and still did not find a solution. > We're running cacti and IO wait is down at 0.5%, M&R are tuned right > down to 1M 1R on each machine, and the machine CPUs are almost idle > with no swap. > Using curl to pull a file from a DN comes down at 110m/s. > > We are now upping things like epoll > > Any ideas really greatly appreciated at this stage! > Tim > > > On Wed, Nov 17, 2010 at 10:20 AM, Friso van Vollenhoven > <fvanvollenho...@xebia.com> wrote: > > Hi Tim, > > Getting 28K of map outputs to reducers should not take minutes. Reducers > on > > a properly setup (1Gb) network should be copying at multiple MB/s. I > think > > you need to get some more info. > > Apart from top, you'll probably also want to look at iostat and vmstat. > The > > first will tell you something about disk utilization and the latter can > tell > > you whether the machines are using swap or not. This is very important. > If > > you are over utilizing physical memory on the machines, thing will be > slow. > > It's even better if you put something in place that allows you to get an > > overall view of the resource usage across the cluster. Look at Ganglia > > (http://ganglia.sourceforge.net/) or Cacti (http://www.cacti.net/) or > > something similar. > > Basically a job is either CPU bound, IO bound or network bound. You need > to > > be able to look at all three to see what the bottleneck is. Also, you can > > run into churn when you saturate resources and processes are competing > for > > them (e.g. when you have two disks and 50 processes / threads reading > from > > them, things will be slow because the OS needs to switch between them a > lot > > and overall throughput will be less than what the disks can do; you can > see > > this when there is a lot of time in iowait, but overall throughput is low > so > > there's a lot of seeks going on). > > > > > > > > On 17 nov 2010, at 09:43, Tim Robertson wrote: > > > > Hi all, > > > > We have setup a small cluster (13 nodes) using CDH3 > > > > We have been tuning it using TeraSort and Hive queries on our data, > > and the copy phase is very slow, so I'd like to ask if anyone can look > > over our config. > > > > We have an unbalanced set of machines (all on a single switch): > > - 10 of Intel @ 2.83GHz Quad, 8GB, 2x500G 7.2K SATA (3 mappers, 2 > reducers) > > - 3 of Intel @ 2.53GHz Dual Quad, 24GB, 6x250GB 5.4K SATA (12 > > mappers, 12 reducers) > > > > We monitored the load using $top on machines, to settle on the number > > of mappers and reducers to stop overloading them, and the map() and > > reduce() is working very nicely - all our time > > > > The config: > > > > io.sort.mb=400 > > io.sort.factor=100 > > mapred.reduce.parallel.copies=20 > > tasktracker.http.threads=80 > > mapred.compress.map.output=true/false (no notible difference) > > mapred.map.output.compression.codec=com.hadoop.compression.lzo.LzoCodec > > mapred.output.compression.type=BLOCK > > mapred.inmem.merge.threshold=0 > > mapred.job.reduce.input.buffer.percent=0.7 > > mapred.job.reuse.jvm.num.tasks=50 > > > > An example job: > > (select basis_of_record,count(1) from occurrence_record group by > > basis_of_record) > > Map input records 262,573,931 finished in 2mins30 using 833 mappers > > Reduce was at 24% at 2mins30 finished map with all 55 running > > Map output records: 1,855 > > Map output bytes: 28,724 > > REDUCE COPY PHASE finished after 7mins01 secs > > Reduce finished after 7mins17secs > > > > I am correct that 28,724 bytes emitted from a map should not take 4mins30 > > right? > > > > We're running puppet so can test changes quickly. > > > > Any pointers on how we can debug / improve this are greatly appreciated! > > Tim > > > > >