Tim,
I have 4 nodes (Quad Core 2.00Ghz, 8GB RAM, 4x1TB disks), where one is
the master+datanode and the rest are datanodes.
Job: Sort 40GB of random data
With the following current configuration setting:
io.sort.factor: 10
io.sort.mb: 100
io.file.buffer.size: 65536
mapred.child.java.opts: -Xmx200M
dfs.datanode.handler.count=3
2 Mappers
2 Reducer
Time taken: 28 minutes
Still testing with more config changes, will send the results out.
-Usman
Hi all,
I thought I would post the findings of my tuning tests running the
sort benchmark.
This is all based on 10 machines (1 as masters and 9 DN/TT) each of:
Dell R300: 2.83G Quadcore (2x6MB cache 1 proc), 8G RAM and 2x500G SATA drives
--- Vanilla installation ---
2M 2R: 36 mins
4M 4R: 36 mins (yes the same)
--- Tuned according to Cloudera http://tinyurl.com/ykupczu ---
io.sort.factor: 20 (mapred-site.xml)
io.sort.mb: 200 (mapred-site.xml)
io.file.buffer.size: 65536 (core-site.xml)
mapred.child.java.opts: -Xmx512M (mapred-site.xml)
2M 2R: 33.5 mins
4M 4R: 29 mins
8M 8R: 41 mins
--- Increasing the task memory a little ---
io.sort.factor: 20
io.sort.mb: 200
io.file.buffer.size: 65536
mapred.child.java.opts: -Xmx1G
2M 2R: 29 mins (adding dfs.datanode.handler.count=8 resulted in 30 mins)
4M 4R: 29 mins (yes the same)
--- Increasing sort memory ---
io.sort.factor: 32
io.sort.mb: 320
io.file.buffer.size: 65536
mapred.child.java.opts: -Xmx1G
2M 2R: 31 mins (yes longer than lower sort sizes)
I am going to stick with the following for now and get back to work...
io.sort.factor: 20
io.sort.mb: 200
io.file.buffer.size: 65536
mapred.child.java.opts: -Xmx1G
dfs.datanode.handler.count=8
4 Mappers
4 Reducer
Hope that helps someone. How did your tuning go Usman?
Tim
On Fri, Oct 16, 2009 at 10:41 PM, tim robertson
<[email protected]> wrote:
No worries Usman, I will try and do the same on Monday.
Thanks Todd for the clarification.
Tim
On Fri, Oct 16, 2009 at 5:30 PM, Usman Waheed <[email protected]> wrote:
Hi Tim,
I have been swamped with some other stuff so did not get a chance to run
further tests on my setup.
Will send them out early next week so we can compare.
Cheers,
Usman
On Fri, Oct 16, 2009 at 4:01 AM, tim robertson
<[email protected]>wrote:
Hi all,
Adding the following to core-site.xml, mapred-site.xml and
hdfs-site.xml (based on Cloudera guidelines:
http://tinyurl.com/ykupczu)
io.sort.factor: 15 (mapred-site.xml)
io.sort.mb: 150 (mapred-site.xml)
io.file.buffer.size: 65536 (core-site.xml)
dfs.datanode.handler.count: 3 (hdfs-site.xml actually this is the
default)
and using the default of HADOOP_HEAPSIZE=1000 (hadoop-env.sh)
Using 2 mappers and 2 reducers, can someone please help me with the
maths as to why my jobs are failing with "Error: Java heap space" in
the maps?
(the same runs fine with io.sort.factor of 10 and io.sort.mb of 100)
io.sort.mb of 200 x 4 (2 mappers, 2 reducers) = 0.8G
Plus the 2 daemons on the node at 1G each = 1.8G
Plus Xmx of 1G for each hadoop daemon task = 5.8G
The machines have 8G in them. Obviously my maths is screwy somewhere...
Hi Tim,
Did you also change mapred.child.java.opts? The HADOOP_HEAPSIZE parameter
is
for the daemons, not the tasks. If you bump up io.sort.mb you also have to
bump up the -Xmx argument in mapred.child.java.opts to give the actual
tasks
more RAM.
-Todd
On Fri, Oct 16, 2009 at 9:59 AM, Erik Forsberg <[email protected]>
wrote:
On Thu, 15 Oct 2009 11:32:35 +0200
Usman Waheed <[email protected]> wrote:
Hi Todd,
Some changes have been applied to the cluster based on the
documentation (URL) you noted below,
I would also like to know what settings people are tuning on the
operating system level. The blog post mentioned here does not mention
much about that, except for the fileno changes.
We got about 3x the read performance when running DFSIOTest by mounting
our ext3 filesystems with the noatime parameter. I saw that mentioned
in the slides from some Cloudera presentation.
(For those who don't know, the noatime parameter turns off the
recording of access time on files. That's a horrible performance killer
since it means every read of a file also means that the kernel must do
a write. These writes are probably queued up, but still, if you don't
need the atime (very few applications do), turn it off!)
Have people been experimenting with different filesystems, or are most
of us running on top of ext3?
How about mounting ext3 with "data=writeback"? That's rumoured to give
the best throughput and could help with write performance. From
mount(8):
writeback
Data ordering is not preserved - data may be written into the
main file system
after its metadata has been committed to the journal. This
is rumoured to be the
highest throughput option. It guarantees internal file system
integrity,
however it can allow old data to appear in files after a crash
and journal recovery.
How would the HDFS consistency checks cope with old data appearing in
the unerlying files after a system crash?
Cheers,
\EF
--
Erik Forsberg <[email protected]>
Developer, Opera Software - http://www.opera.com/