Thanks for your help Jason. I actually did reduce the heap size to 400M and it sped things up a few percent. From my experience with jvm's, if you can handle lower amounts of heap your app will run faster because GC is more efficient for smaller garbage collections (which is also why using incremental garbage collection is often desired in web apps). But, the contrast to that is if you give it too little heap it can choke your app and run the garbage collection too often, sort of like it is drowning and gasping for breath.

Jason Venner wrote:
The value really varies by job and by cluster, the larger the split, the
more chance there is that a small number of splits will take much longer to
complete than the rest resulting in a long job tail where very little of
your cluster is utilized while they complete.

The flip side is with very small task the overhead, startup time and
co-ordination latency (which has been improved) can cause in efficient
utilization of your cluster resource.

If you really want to drive up your CPU utilization, reduce your per task
memory size to the bare minimum, and your JVM's will consume massive amounts
of CPU doing garbage collection :) It happend at a place I worked where ~60%
of the job cpu was garbage collection.


On Wed, Oct 14, 2009 at 11:26 AM, Chris Seline <[email protected]> wrote:

That definitely helps a lot! I saw a few people talking about it on the
webs, and they say to set the value to Long.MAX_VALUE, but that is not what
I have found to be best. I see about 25% improvement at 300MB (300000000),
CPU utilization is up to about 50-70%+, but I am still fine tuning.


thanks!

Chris

Jason Venner wrote:

I remember having a problem like this at one point, it was related to the
mean run time of my tasks, and the rate that the jobtracker could start
new
tasks.

By increasing the split size until the mean run time of my tasks was in
the
minutes, I was able to drive up the utilization.


On Wed, Oct 14, 2009 at 7:31 AM, Chris Seline <[email protected]>
wrote:



No, there doesn't seem to be all that much network traffic. Most of the
time traffic (measured with nethogs) is about 15-30K/s on the master and
slaves during map, sometimes it bursts up 5-10 MB/s on a slave for maybe
5-10 seconds on a query that takes 10 minutes, but that is still less
than
what I see in scp transfers on EC2, which is typically about 30 MB/s.

thanks

Chris


Jason Venner wrote:



are your network interface or the namenode/jobtracker/datanodes
saturated


On Tue, Oct 13, 2009 at 9:05 AM, Chris Seline <[email protected]>
wrote:





I am using the 0.3 Cloudera scripts to start a Hadoop cluster on EC2 of
11
c1.xlarge instances (1 master, 10 slaves), that is the biggest instance
available with 20 compute units and 4x 400gb disks.

I wrote some scripts to test many (100's) of configurations running a
particular Hive query to try to make it as fast as possible, but no
matter
what I don't seem to be able to get above roughly 45% cpu utilization
on
the
slaves, and not more than about 1.5% wait state. I have also measured
network traffic and there don't seem to be bottlenecks there at all.

Here are some typical CPU utilization lines from top on a slave when
running a query:
Cpu(s): 33.9%us,  7.4%sy,  0.0%ni, 56.8%id,  0.6%wa,  0.0%hi,  0.5%si,
 0.7%st
Cpu(s): 33.6%us,  5.9%sy,  0.0%ni, 58.7%id,  0.9%wa,  0.0%hi,  0.4%si,
 0.5%st
Cpu(s): 33.9%us,  7.2%sy,  0.0%ni, 56.8%id,  0.5%wa,  0.0%hi,  0.6%si,
 1.0%st
Cpu(s): 38.6%us,  8.7%sy,  0.0%ni, 50.8%id,  0.5%wa,  0.0%hi,  0.7%si,
 0.7%st
Cpu(s): 36.8%us,  7.4%sy,  0.0%ni, 53.6%id,  0.4%wa,  0.0%hi,  0.5%si,
 1.3%st

It seems like if tuned properly, I should be able to max out my cpu (or
my
disk) and get roughly twice the performance I am seeing now. None of
the
parameters I am tuning seem to be able to achieve this. Adjusting
mapred.map.tasks and mapred.reduce.tasks does help somewhat, and
setting
the
io.file.buffer.size to 4096 does better than the default, but the rest
of
the values I am testing seem to have little positive  effect.

These are the parameters I am testing, and the values tried:

io.sort.factor=2,3,4,5,10,15,20,25,30,50,100



mapred.job.shuffle.merge.percent=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.97,0.98,0.99
io.bytes.per.checksum=256,512,1024,2048,4192
mapred.output.compress=true,false
hive.exec.compress.intermediate=true,false



hive.map.aggr.hash.min.reduction=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.97,0.98,0.99

mapred.map.tasks=1,2,3,4,5,6,8,10,12,15,20,25,30,40,50,60,75,100,150,200



mapred.child.java.opts=-Xmx400m,-Xmx500m,-Xmx600m,-Xmx700m,-Xmx800m,-Xmx900m,-Xmx1000m,-Xmx1200m,-Xmx1400m,-Xmx1600m,-Xmx2000m
mapred.reduce.tasks=5,10,15,20,25,30,35,40,50,60,70,80,100,125,150,200
mapred.merge.recordsBeforeProgress=5000,10000,20000,30000



mapred.job.shuffle.input.buffer.percent=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.99



io.sort.spill.percent=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.99
mapred.job.tracker.handler.count=3,4,5,7,10,15,25



hive.merge.size.per.task=64000000,128000000,168000000,256000000,300000000,400000000
hive.optimize.ppd=true,false
hive.merge.mapredfiles=false,true



io.sort.record.percent=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.97,0.98,0.99



hive.map.aggr.hash.percentmemory=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.97,0.98,0.99
mapred.tasktracker.reduce.tasks.maximum=1,2,3,4,5,6,8,10,12,15,20,30
mapred.reduce.parallel.copies=1,2,4,6,8,10,13,16,20,25,30,50
io.seqfile.lazydecompress=true,false
io.sort.mb=20,50,75,100,150,200,250,350,500
mapred.compress.map.output=true,false
io.file.buffer.size=1024,2048,4096,8192,16384,32768,65536,131072,262144
hive.exec.reducers.bytes.per.reducer=1000000000
dfs.datanode.handler.count=1,2,3,4,5,6,8,10,15
mapred.tasktracker.map.tasks.maximum=5,8,12,20

Anyone have any thoughts for other parameters I might try? Am I going
about
this the wrong way? Am I missing some other bottleneck?

thanks

Chris Seline













Reply via email to