Re: Hardware performance from HADOOP cluster

Bryan Talbot Mon, 19 Oct 2009 17:14:38 -0700

Here's another data point from a small cluster running Cloudera 20.1:


4 slaves of 2 Quad core (E5405) 2.00 GHz, 8 GB RAM, 4 1TB SATA drives
1 master running nn, 2nn and jt


dfs.replication=2
io.sort.factor: 25
io.sort.mb: 250
io.file.buffer.size: 65536
mapred.child.java.opts: -Xmx400M
mapred.tasktracker.map.tasks.maximum=7
mapred.tasktracker.reduce.tasks.maximum=7
mapred.job.reuse.jvm.num.tasks=10

$>hadoop jar /usr/lib/hadoop/hadoop-0.20.1+133-examples.jarrandomwriter -D dfs.block.size=134217728 input


Takes about 4 mins

$>hadoop jar /usr/lib/hadoop/hadoop-0.20.1+133-examples.jar sort inputoutput


Takes about 11 mins (map takes about 4.5 mins)

With the default configurations, the map tasks run for just a coupleseconds with the average number of tasks running at any one time beingjust 20% of the map task capacity. Increasing the block size andreusing jvm tasks had the most noticeable impact on performance.



-Bryan




On Oct 19, 2009, at Oct 19, 7:14 AM, Usman Waheed wrote:

io.sort.factor: 10
io.sort.mb: 100
io.file.buffer.size: 65536
mapred.child.java.opts: -Xmx200M
dfs.datanode.handler.count=3
2 Mappers
2 Reducer

Re: Hardware performance from HADOOP cluster

Reply via email to