Hi all,

I thought I would post the findings of my tuning tests running the
sort benchmark.

This is all based on 10 machines (1 as masters and 9 DN/TT) each of:
Dell R300: 2.83G Quadcore (2x6MB cache 1 proc), 8G RAM and 2x500G SATA drives

--- Vanilla installation ---
2M 2R: 36 mins
4M 4R: 36 mins (yes the same)


--- Tuned according to Cloudera http://tinyurl.com/ykupczu ---
io.sort.factor: 20  (mapred-site.xml)
io.sort.mb: 200  (mapred-site.xml)
io.file.buffer.size: 65536   (core-site.xml)
mapred.child.java.opts: -Xmx512M  (mapred-site.xml)

2M 2R: 33.5 mins
4M 4R: 29 mins
8M 8R: 41 mins


--- Increasing the task memory a little ---
io.sort.factor: 20
io.sort.mb: 200
io.file.buffer.size: 65536
mapred.child.java.opts: -Xmx1G

2M 2R: 29 mins  (adding dfs.datanode.handler.count=8 resulted in 30 mins)
4M 4R: 29 mins (yes the same)


--- Increasing sort memory ---
io.sort.factor: 32
io.sort.mb: 320
io.file.buffer.size: 65536
mapred.child.java.opts: -Xmx1G

2M 2R: 31 mins (yes longer than lower sort sizes)

I am going to stick with the following for now and get back to work...
  io.sort.factor: 20
  io.sort.mb: 200
  io.file.buffer.size: 65536
  mapred.child.java.opts: -Xmx1G
  dfs.datanode.handler.count=8
  4 Mappers
  4 Reducer

Hope that helps someone.  How did your tuning go Usman?

Tim


On Fri, Oct 16, 2009 at 10:41 PM, tim robertson
<[email protected]> wrote:
> No worries Usman,  I will try and do the same on Monday.
>
> Thanks Todd for the clarification.
>
> Tim
>
>
> On Fri, Oct 16, 2009 at 5:30 PM, Usman Waheed <[email protected]> wrote:
>> Hi Tim,
>>
>> I have been swamped with some other stuff so did not get a chance to run
>> further tests on my setup.
>> Will send them out early next week so we can compare.
>>
>> Cheers,
>> Usman
>>
>>> On Fri, Oct 16, 2009 at 4:01 AM, tim robertson
>>> <[email protected]>wrote:
>>>
>>>
>>>>
>>>> Hi all,
>>>>
>>>> Adding the following to core-site.xml, mapred-site.xml and
>>>> hdfs-site.xml (based on Cloudera guidelines:
>>>> http://tinyurl.com/ykupczu)
>>>>  io.sort.factor: 15  (mapred-site.xml)
>>>>  io.sort.mb: 150  (mapred-site.xml)
>>>>  io.file.buffer.size: 65536   (core-site.xml)
>>>>  dfs.datanode.handler.count: 3 (hdfs-site.xml  actually this is the
>>>> default)
>>>>
>>>> and using the default of HADOOP_HEAPSIZE=1000 (hadoop-env.sh)
>>>>
>>>> Using 2 mappers and 2 reducers, can someone please help me with the
>>>> maths as to why my jobs are failing with "Error: Java heap space" in
>>>> the maps?
>>>> (the same runs fine with io.sort.factor of 10 and io.sort.mb of 100)
>>>>
>>>> io.sort.mb of 200 x 4 (2 mappers, 2 reducers) = 0.8G
>>>> Plus the 2 daemons on the node at 1G each = 1.8G
>>>> Plus Xmx of 1G for each hadoop daemon task = 5.8G
>>>>
>>>> The machines have 8G in them.  Obviously my maths is screwy somewhere...
>>>>
>>>>
>>>>
>>>
>>> Hi Tim,
>>>
>>> Did you also change mapred.child.java.opts? The HADOOP_HEAPSIZE parameter
>>> is
>>> for the daemons, not the tasks. If you bump up io.sort.mb you also have to
>>> bump up the -Xmx argument in mapred.child.java.opts to give the actual
>>> tasks
>>> more RAM.
>>>
>>> -Todd
>>>
>>>
>>>
>>>>
>>>> On Fri, Oct 16, 2009 at 9:59 AM, Erik Forsberg <[email protected]>
>>>> wrote:
>>>>
>>>>>
>>>>> On Thu, 15 Oct 2009 11:32:35 +0200
>>>>> Usman Waheed <[email protected]> wrote:
>>>>>
>>>>>
>>>>>>
>>>>>> Hi Todd,
>>>>>>
>>>>>> Some changes have been applied to the cluster based on the
>>>>>> documentation (URL) you noted below,
>>>>>>
>>>>>
>>>>> I would also like to know what settings people are tuning on the
>>>>> operating system level. The blog post mentioned here does not mention
>>>>> much about that, except for the fileno changes.
>>>>>
>>>>> We got about 3x the read performance when running DFSIOTest by mounting
>>>>> our ext3 filesystems with the noatime parameter. I saw that mentioned
>>>>> in the slides from some Cloudera presentation.
>>>>>
>>>>> (For those who don't know, the noatime parameter turns off the
>>>>> recording of access time on files. That's a horrible performance killer
>>>>> since it means every read of a file also means that the kernel must do
>>>>> a write. These writes are probably queued up, but still, if you don't
>>>>> need the atime (very few applications do), turn it off!)
>>>>>
>>>>> Have people been experimenting with different filesystems, or are most
>>>>> of us running on top of ext3?
>>>>>
>>>>> How about mounting ext3 with "data=writeback"? That's rumoured to give
>>>>> the best throughput and could help with write performance. From
>>>>> mount(8):
>>>>>
>>>>>    writeback
>>>>>           Data ordering is not preserved - data may be written into the
>>>>>
>>>>
>>>> main file system
>>>>
>>>>>
>>>>>           after its metadata has been  committed  to the journal.  This
>>>>>
>>>>
>>>> is rumoured to be the
>>>>
>>>>>
>>>>>           highest throughput option.  It guarantees internal file system
>>>>>
>>>>
>>>> integrity,
>>>>
>>>>>
>>>>>           however it can allow old data to appear in files after a crash
>>>>>
>>>>
>>>> and journal recovery.
>>>>
>>>>>
>>>>> How would the HDFS consistency checks cope with old data appearing in
>>>>> the unerlying files after a system crash?
>>>>>
>>>>> Cheers,
>>>>> \EF
>>>>> --
>>>>> Erik Forsberg <[email protected]>
>>>>> Developer, Opera Software - http://www.opera.com/
>>>>>
>>>>>
>>>
>>>
>>
>>
>

Reply via email to