Tim,

I have 4 nodes (Quad Core 2.00Ghz, 8GB RAM, 4x1TB disks), where one is the master+datanode and the rest are datanodes.

Job: Sort 40GB of random data

With the following current configuration setting:

io.sort.factor: 10
io.sort.mb: 100
io.file.buffer.size: 65536
mapred.child.java.opts: -Xmx200M
dfs.datanode.handler.count=3
2 Mappers
2 Reducer
Time taken: 28 minutes

Still testing with more config changes, will send the results out.

-Usman



Hi all,

I thought I would post the findings of my tuning tests running the
sort benchmark.

This is all based on 10 machines (1 as masters and 9 DN/TT) each of:
Dell R300: 2.83G Quadcore (2x6MB cache 1 proc), 8G RAM and 2x500G SATA drives

--- Vanilla installation ---
2M 2R: 36 mins
4M 4R: 36 mins (yes the same)


--- Tuned according to Cloudera http://tinyurl.com/ykupczu ---
io.sort.factor: 20  (mapred-site.xml)
io.sort.mb: 200  (mapred-site.xml)
io.file.buffer.size: 65536   (core-site.xml)
mapred.child.java.opts: -Xmx512M  (mapred-site.xml)

2M 2R: 33.5 mins
4M 4R: 29 mins
8M 8R: 41 mins


--- Increasing the task memory a little ---
io.sort.factor: 20
io.sort.mb: 200
io.file.buffer.size: 65536
mapred.child.java.opts: -Xmx1G

2M 2R: 29 mins  (adding dfs.datanode.handler.count=8 resulted in 30 mins)
4M 4R: 29 mins (yes the same)


--- Increasing sort memory ---
io.sort.factor: 32
io.sort.mb: 320
io.file.buffer.size: 65536
mapred.child.java.opts: -Xmx1G

2M 2R: 31 mins (yes longer than lower sort sizes)

I am going to stick with the following for now and get back to work...
  io.sort.factor: 20
  io.sort.mb: 200
  io.file.buffer.size: 65536
  mapred.child.java.opts: -Xmx1G
  dfs.datanode.handler.count=8
  4 Mappers
  4 Reducer

Hope that helps someone.  How did your tuning go Usman?

Tim


On Fri, Oct 16, 2009 at 10:41 PM, tim robertson
<[email protected]> wrote:
No worries Usman,  I will try and do the same on Monday.

Thanks Todd for the clarification.

Tim


On Fri, Oct 16, 2009 at 5:30 PM, Usman Waheed <[email protected]> wrote:
Hi Tim,

I have been swamped with some other stuff so did not get a chance to run
further tests on my setup.
Will send them out early next week so we can compare.

Cheers,
Usman

On Fri, Oct 16, 2009 at 4:01 AM, tim robertson
<[email protected]>wrote:


Hi all,

Adding the following to core-site.xml, mapred-site.xml and
hdfs-site.xml (based on Cloudera guidelines:
http://tinyurl.com/ykupczu)
 io.sort.factor: 15  (mapred-site.xml)
 io.sort.mb: 150  (mapred-site.xml)
 io.file.buffer.size: 65536   (core-site.xml)
 dfs.datanode.handler.count: 3 (hdfs-site.xml  actually this is the
default)

and using the default of HADOOP_HEAPSIZE=1000 (hadoop-env.sh)

Using 2 mappers and 2 reducers, can someone please help me with the
maths as to why my jobs are failing with "Error: Java heap space" in
the maps?
(the same runs fine with io.sort.factor of 10 and io.sort.mb of 100)

io.sort.mb of 200 x 4 (2 mappers, 2 reducers) = 0.8G
Plus the 2 daemons on the node at 1G each = 1.8G
Plus Xmx of 1G for each hadoop daemon task = 5.8G

The machines have 8G in them.  Obviously my maths is screwy somewhere...



Hi Tim,

Did you also change mapred.child.java.opts? The HADOOP_HEAPSIZE parameter
is
for the daemons, not the tasks. If you bump up io.sort.mb you also have to
bump up the -Xmx argument in mapred.child.java.opts to give the actual
tasks
more RAM.

-Todd



On Fri, Oct 16, 2009 at 9:59 AM, Erik Forsberg <[email protected]>
wrote:

On Thu, 15 Oct 2009 11:32:35 +0200
Usman Waheed <[email protected]> wrote:


Hi Todd,

Some changes have been applied to the cluster based on the
documentation (URL) you noted below,

I would also like to know what settings people are tuning on the
operating system level. The blog post mentioned here does not mention
much about that, except for the fileno changes.

We got about 3x the read performance when running DFSIOTest by mounting
our ext3 filesystems with the noatime parameter. I saw that mentioned
in the slides from some Cloudera presentation.

(For those who don't know, the noatime parameter turns off the
recording of access time on files. That's a horrible performance killer
since it means every read of a file also means that the kernel must do
a write. These writes are probably queued up, but still, if you don't
need the atime (very few applications do), turn it off!)

Have people been experimenting with different filesystems, or are most
of us running on top of ext3?

How about mounting ext3 with "data=writeback"? That's rumoured to give
the best throughput and could help with write performance. From
mount(8):

   writeback
          Data ordering is not preserved - data may be written into the

main file system

          after its metadata has been  committed  to the journal.  This

is rumoured to be the

          highest throughput option.  It guarantees internal file system

integrity,

          however it can allow old data to appear in files after a crash

and journal recovery.

How would the HDFS consistency checks cope with old data appearing in
the unerlying files after a system crash?

Cheers,
\EF
--
Erik Forsberg <[email protected]>
Developer, Opera Software - http://www.opera.com/




Reply via email to