Hi all, I thought I would post the findings of my tuning tests running the sort benchmark.
This is all based on 10 machines (1 as masters and 9 DN/TT) each of: Dell R300: 2.83G Quadcore (2x6MB cache 1 proc), 8G RAM and 2x500G SATA drives --- Vanilla installation --- 2M 2R: 36 mins 4M 4R: 36 mins (yes the same) --- Tuned according to Cloudera http://tinyurl.com/ykupczu --- io.sort.factor: 20 (mapred-site.xml) io.sort.mb: 200 (mapred-site.xml) io.file.buffer.size: 65536 (core-site.xml) mapred.child.java.opts: -Xmx512M (mapred-site.xml) 2M 2R: 33.5 mins 4M 4R: 29 mins 8M 8R: 41 mins --- Increasing the task memory a little --- io.sort.factor: 20 io.sort.mb: 200 io.file.buffer.size: 65536 mapred.child.java.opts: -Xmx1G 2M 2R: 29 mins (adding dfs.datanode.handler.count=8 resulted in 30 mins) 4M 4R: 29 mins (yes the same) --- Increasing sort memory --- io.sort.factor: 32 io.sort.mb: 320 io.file.buffer.size: 65536 mapred.child.java.opts: -Xmx1G 2M 2R: 31 mins (yes longer than lower sort sizes) I am going to stick with the following for now and get back to work... io.sort.factor: 20 io.sort.mb: 200 io.file.buffer.size: 65536 mapred.child.java.opts: -Xmx1G dfs.datanode.handler.count=8 4 Mappers 4 Reducer Hope that helps someone. How did your tuning go Usman? Tim On Fri, Oct 16, 2009 at 10:41 PM, tim robertson <[email protected]> wrote: > No worries Usman, I will try and do the same on Monday. > > Thanks Todd for the clarification. > > Tim > > > On Fri, Oct 16, 2009 at 5:30 PM, Usman Waheed <[email protected]> wrote: >> Hi Tim, >> >> I have been swamped with some other stuff so did not get a chance to run >> further tests on my setup. >> Will send them out early next week so we can compare. >> >> Cheers, >> Usman >> >>> On Fri, Oct 16, 2009 at 4:01 AM, tim robertson >>> <[email protected]>wrote: >>> >>> >>>> >>>> Hi all, >>>> >>>> Adding the following to core-site.xml, mapred-site.xml and >>>> hdfs-site.xml (based on Cloudera guidelines: >>>> http://tinyurl.com/ykupczu) >>>> io.sort.factor: 15 (mapred-site.xml) >>>> io.sort.mb: 150 (mapred-site.xml) >>>> io.file.buffer.size: 65536 (core-site.xml) >>>> dfs.datanode.handler.count: 3 (hdfs-site.xml actually this is the >>>> default) >>>> >>>> and using the default of HADOOP_HEAPSIZE=1000 (hadoop-env.sh) >>>> >>>> Using 2 mappers and 2 reducers, can someone please help me with the >>>> maths as to why my jobs are failing with "Error: Java heap space" in >>>> the maps? >>>> (the same runs fine with io.sort.factor of 10 and io.sort.mb of 100) >>>> >>>> io.sort.mb of 200 x 4 (2 mappers, 2 reducers) = 0.8G >>>> Plus the 2 daemons on the node at 1G each = 1.8G >>>> Plus Xmx of 1G for each hadoop daemon task = 5.8G >>>> >>>> The machines have 8G in them. Obviously my maths is screwy somewhere... >>>> >>>> >>>> >>> >>> Hi Tim, >>> >>> Did you also change mapred.child.java.opts? The HADOOP_HEAPSIZE parameter >>> is >>> for the daemons, not the tasks. If you bump up io.sort.mb you also have to >>> bump up the -Xmx argument in mapred.child.java.opts to give the actual >>> tasks >>> more RAM. >>> >>> -Todd >>> >>> >>> >>>> >>>> On Fri, Oct 16, 2009 at 9:59 AM, Erik Forsberg <[email protected]> >>>> wrote: >>>> >>>>> >>>>> On Thu, 15 Oct 2009 11:32:35 +0200 >>>>> Usman Waheed <[email protected]> wrote: >>>>> >>>>> >>>>>> >>>>>> Hi Todd, >>>>>> >>>>>> Some changes have been applied to the cluster based on the >>>>>> documentation (URL) you noted below, >>>>>> >>>>> >>>>> I would also like to know what settings people are tuning on the >>>>> operating system level. The blog post mentioned here does not mention >>>>> much about that, except for the fileno changes. >>>>> >>>>> We got about 3x the read performance when running DFSIOTest by mounting >>>>> our ext3 filesystems with the noatime parameter. I saw that mentioned >>>>> in the slides from some Cloudera presentation. >>>>> >>>>> (For those who don't know, the noatime parameter turns off the >>>>> recording of access time on files. That's a horrible performance killer >>>>> since it means every read of a file also means that the kernel must do >>>>> a write. These writes are probably queued up, but still, if you don't >>>>> need the atime (very few applications do), turn it off!) >>>>> >>>>> Have people been experimenting with different filesystems, or are most >>>>> of us running on top of ext3? >>>>> >>>>> How about mounting ext3 with "data=writeback"? That's rumoured to give >>>>> the best throughput and could help with write performance. From >>>>> mount(8): >>>>> >>>>> writeback >>>>> Data ordering is not preserved - data may be written into the >>>>> >>>> >>>> main file system >>>> >>>>> >>>>> after its metadata has been committed to the journal. This >>>>> >>>> >>>> is rumoured to be the >>>> >>>>> >>>>> highest throughput option. It guarantees internal file system >>>>> >>>> >>>> integrity, >>>> >>>>> >>>>> however it can allow old data to appear in files after a crash >>>>> >>>> >>>> and journal recovery. >>>> >>>>> >>>>> How would the HDFS consistency checks cope with old data appearing in >>>>> the unerlying files after a system crash? >>>>> >>>>> Cheers, >>>>> \EF >>>>> -- >>>>> Erik Forsberg <[email protected]> >>>>> Developer, Opera Software - http://www.opera.com/ >>>>> >>>>> >>> >>> >> >> >
