are your network interface or the namenode/jobtracker/datanodes saturated
On Tue, Oct 13, 2009 at 9:05 AM, Chris Seline <[email protected]> wrote: > I am using the 0.3 Cloudera scripts to start a Hadoop cluster on EC2 of 11 > c1.xlarge instances (1 master, 10 slaves), that is the biggest instance > available with 20 compute units and 4x 400gb disks. > > I wrote some scripts to test many (100's) of configurations running a > particular Hive query to try to make it as fast as possible, but no matter > what I don't seem to be able to get above roughly 45% cpu utilization on the > slaves, and not more than about 1.5% wait state. I have also measured > network traffic and there don't seem to be bottlenecks there at all. > > Here are some typical CPU utilization lines from top on a slave when > running a query: > Cpu(s): 33.9%us, 7.4%sy, 0.0%ni, 56.8%id, 0.6%wa, 0.0%hi, 0.5%si, > 0.7%st > Cpu(s): 33.6%us, 5.9%sy, 0.0%ni, 58.7%id, 0.9%wa, 0.0%hi, 0.4%si, > 0.5%st > Cpu(s): 33.9%us, 7.2%sy, 0.0%ni, 56.8%id, 0.5%wa, 0.0%hi, 0.6%si, > 1.0%st > Cpu(s): 38.6%us, 8.7%sy, 0.0%ni, 50.8%id, 0.5%wa, 0.0%hi, 0.7%si, > 0.7%st > Cpu(s): 36.8%us, 7.4%sy, 0.0%ni, 53.6%id, 0.4%wa, 0.0%hi, 0.5%si, > 1.3%st > > It seems like if tuned properly, I should be able to max out my cpu (or my > disk) and get roughly twice the performance I am seeing now. None of the > parameters I am tuning seem to be able to achieve this. Adjusting > mapred.map.tasks and mapred.reduce.tasks does help somewhat, and setting the > io.file.buffer.size to 4096 does better than the default, but the rest of > the values I am testing seem to have little positive effect. > > These are the parameters I am testing, and the values tried: > > io.sort.factor=2,3,4,5,10,15,20,25,30,50,100 > > mapred.job.shuffle.merge.percent=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.97,0.98,0.99 > io.bytes.per.checksum=256,512,1024,2048,4192 > mapred.output.compress=true,false > hive.exec.compress.intermediate=true,false > > hive.map.aggr.hash.min.reduction=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.97,0.98,0.99 > mapred.map.tasks=1,2,3,4,5,6,8,10,12,15,20,25,30,40,50,60,75,100,150,200 > > mapred.child.java.opts=-Xmx400m,-Xmx500m,-Xmx600m,-Xmx700m,-Xmx800m,-Xmx900m,-Xmx1000m,-Xmx1200m,-Xmx1400m,-Xmx1600m,-Xmx2000m > mapred.reduce.tasks=5,10,15,20,25,30,35,40,50,60,70,80,100,125,150,200 > mapred.merge.recordsBeforeProgress=5000,10000,20000,30000 > > mapred.job.shuffle.input.buffer.percent=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.99 > > io.sort.spill.percent=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.99 > mapred.job.tracker.handler.count=3,4,5,7,10,15,25 > > hive.merge.size.per.task=64000000,128000000,168000000,256000000,300000000,400000000 > hive.optimize.ppd=true,false > hive.merge.mapredfiles=false,true > > io.sort.record.percent=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.97,0.98,0.99 > > hive.map.aggr.hash.percentmemory=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.97,0.98,0.99 > mapred.tasktracker.reduce.tasks.maximum=1,2,3,4,5,6,8,10,12,15,20,30 > mapred.reduce.parallel.copies=1,2,4,6,8,10,13,16,20,25,30,50 > io.seqfile.lazydecompress=true,false > io.sort.mb=20,50,75,100,150,200,250,350,500 > mapred.compress.map.output=true,false > io.file.buffer.size=1024,2048,4096,8192,16384,32768,65536,131072,262144 > hive.exec.reducers.bytes.per.reducer=1000000000 > dfs.datanode.handler.count=1,2,3,4,5,6,8,10,15 > mapred.tasktracker.map.tasks.maximum=5,8,12,20 > > Anyone have any thoughts for other parameters I might try? Am I going about > this the wrong way? Am I missing some other bottleneck? > > thanks > > Chris Seline > -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.amazon.com/dp/1430219424?tag=jewlerymall www.prohadoopbook.com a community for Hadoop Professionals
