Re: Hardware performance from HADOOP cluster

tim robertson Mon, 19 Oct 2009 11:28:59 -0700

Just while I am on this thread of performance I thought I would continue...


I am moving a 200 million record table (mostly varchar) from a myisam
mysql database to hadoop.  First map reduce tests seem pretty
positive.  A full table scan in mysql takes 17 minutes on a $20,000
pretty heavyweight 64G memory server (sorry don't know exact CPU /
Disk specs offhand), with a count that can use an indexed INTEGER
(index pegged in memory) taking 3 minutes.  Exporting the table and
importing to HDFS as a tab delimited file (100GB), on the above
cluster, the same scans take 4 minutes for a custom MR job.

Actually I have 2x 200 million record tables, and a full scan
requiring a join of both those tables in mysql is around 30 minutes,
but I joined them in the export before the import into hadoop.  So
some of my ad hoc reports have just come down from 30 mins to 4 mins
(and I will use Hive for this).

Tim


> Mon, Oct 19, 2009 at 4:14 PM, Usman Waheed <[email protected]> wrote:
> Tim,
>
> I have 4 nodes (Quad Core 2.00Ghz, 8GB RAM, 4x1TB disks), where one is the
> master+datanode and the rest are datanodes.
>
> Job: Sort 40GB of random data
>
> With the following current configuration setting:
>
> io.sort.factor: 10
> io.sort.mb: 100
> io.file.buffer.size: 65536
> mapred.child.java.opts: -Xmx200M
> dfs.datanode.handler.count=3
> 2 Mappers
> 2 Reducer
> Time taken: 28 minutes
>
> Still testing with more config changes, will send the results out.
>
> -Usman
>
>
>
>> Hi all,
>>
>> I thought I would post the findings of my tuning tests running the
>> sort benchmark.
>>
>> This is all based on 10 machines (1 as masters and 9 DN/TT) each of:
>> Dell R300: 2.83G Quadcore (2x6MB cache 1 proc), 8G RAM and 2x500G SATA
>> drives
>>
>> --- Vanilla installation ---
>> 2M 2R: 36 mins
>> 4M 4R: 36 mins (yes the same)
>>
>>
>> --- Tuned according to Cloudera http://tinyurl.com/ykupczu ---
>> io.sort.factor: 20  (mapred-site.xml)
>> io.sort.mb: 200  (mapred-site.xml)
>> io.file.buffer.size: 65536   (core-site.xml)
>> mapred.child.java.opts: -Xmx512M  (mapred-site.xml)
>>
>> 2M 2R: 33.5 mins
>> 4M 4R: 29 mins
>> 8M 8R: 41 mins
>>
>>
>> --- Increasing the task memory a little ---
>> io.sort.factor: 20
>> io.sort.mb: 200
>> io.file.buffer.size: 65536
>> mapred.child.java.opts: -Xmx1G
>>
>> 2M 2R: 29 mins  (adding dfs.datanode.handler.count=8 resulted in 30 mins)
>> 4M 4R: 29 mins (yes the same)
>>
>>
>> --- Increasing sort memory ---
>> io.sort.factor: 32
>> io.sort.mb: 320
>> io.file.buffer.size: 65536
>> mapred.child.java.opts: -Xmx1G
>>
>> 2M 2R: 31 mins (yes longer than lower sort sizes)
>>
>> I am going to stick with the following for now and get back to work...
>>  io.sort.factor: 20
>>  io.sort.mb: 200
>>  io.file.buffer.size: 65536
>>  mapred.child.java.opts: -Xmx1G
>>  dfs.datanode.handler.count=8
>>  4 Mappers
>>  4 Reducer
>>
>> Hope that helps someone.  How did your tuning go Usman?
>>
>> Tim
>>
>>
>> On Fri, Oct 16, 2009 at 10:41 PM, tim robertson
>> <[email protected]> wrote:
>>
>>>
>>> No worries Usman,  I will try and do the same on Monday.
>>>
>>> Thanks Todd for the clarification.
>>>
>>> Tim
>>>
>>>
>>> On Fri, Oct 16, 2009 at 5:30 PM, Usman Waheed <[email protected]> wrote:
>>>
>>>>
>>>> Hi Tim,
>>>>
>>>> I have been swamped with some other stuff so did not get a chance to run
>>>> further tests on my setup.
>>>> Will send them out early next week so we can compare.
>>>>
>>>> Cheers,
>>>> Usman
>>>>
>>>>
>>>>>
>>>>> On Fri, Oct 16, 2009 at 4:01 AM, tim robertson
>>>>> <[email protected]>wrote:
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> Adding the following to core-site.xml, mapred-site.xml and
>>>>>> hdfs-site.xml (based on Cloudera guidelines:
>>>>>> http://tinyurl.com/ykupczu)
>>>>>>  io.sort.factor: 15  (mapred-site.xml)
>>>>>>  io.sort.mb: 150  (mapred-site.xml)
>>>>>>  io.file.buffer.size: 65536   (core-site.xml)
>>>>>>  dfs.datanode.handler.count: 3 (hdfs-site.xml  actually this is the
>>>>>> default)
>>>>>>
>>>>>> and using the default of HADOOP_HEAPSIZE=1000 (hadoop-env.sh)
>>>>>>
>>>>>> Using 2 mappers and 2 reducers, can someone please help me with the
>>>>>> maths as to why my jobs are failing with "Error: Java heap space" in
>>>>>> the maps?
>>>>>> (the same runs fine with io.sort.factor of 10 and io.sort.mb of 100)
>>>>>>
>>>>>> io.sort.mb of 200 x 4 (2 mappers, 2 reducers) = 0.8G
>>>>>> Plus the 2 daemons on the node at 1G each = 1.8G
>>>>>> Plus Xmx of 1G for each hadoop daemon task = 5.8G
>>>>>>
>>>>>> The machines have 8G in them.  Obviously my maths is screwy
>>>>>> somewhere...
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> Hi Tim,
>>>>>
>>>>> Did you also change mapred.child.java.opts? The HADOOP_HEAPSIZE
>>>>> parameter
>>>>> is
>>>>> for the daemons, not the tasks. If you bump up io.sort.mb you also have
>>>>> to
>>>>> bump up the -Xmx argument in mapred.child.java.opts to give the actual
>>>>> tasks
>>>>> more RAM.
>>>>>
>>>>> -Todd
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> On Fri, Oct 16, 2009 at 9:59 AM, Erik Forsberg <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> On Thu, 15 Oct 2009 11:32:35 +0200
>>>>>>> Usman Waheed <[email protected]> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Hi Todd,
>>>>>>>>
>>>>>>>> Some changes have been applied to the cluster based on the
>>>>>>>> documentation (URL) you noted below,
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> I would also like to know what settings people are tuning on the
>>>>>>> operating system level. The blog post mentioned here does not mention
>>>>>>> much about that, except for the fileno changes.
>>>>>>>
>>>>>>> We got about 3x the read performance when running DFSIOTest by
>>>>>>> mounting
>>>>>>> our ext3 filesystems with the noatime parameter. I saw that mentioned
>>>>>>> in the slides from some Cloudera presentation.
>>>>>>>
>>>>>>> (For those who don't know, the noatime parameter turns off the
>>>>>>> recording of access time on files. That's a horrible performance
>>>>>>> killer
>>>>>>> since it means every read of a file also means that the kernel must
>>>>>>> do
>>>>>>> a write. These writes are probably queued up, but still, if you don't
>>>>>>> need the atime (very few applications do), turn it off!)
>>>>>>>
>>>>>>> Have people been experimenting with different filesystems, or are
>>>>>>> most
>>>>>>> of us running on top of ext3?
>>>>>>>
>>>>>>> How about mounting ext3 with "data=writeback"? That's rumoured to
>>>>>>> give
>>>>>>> the best throughput and could help with write performance. From
>>>>>>> mount(8):
>>>>>>>
>>>>>>>   writeback
>>>>>>>          Data ordering is not preserved - data may be written into
>>>>>>> the
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> main file system
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>          after its metadata has been  committed  to the journal.
>>>>>>>  This
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> is rumoured to be the
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>          highest throughput option.  It guarantees internal file
>>>>>>> system
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> integrity,
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>          however it can allow old data to appear in files after a
>>>>>>> crash
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> and journal recovery.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> How would the HDFS consistency checks cope with old data appearing in
>>>>>>> the unerlying files after a system crash?
>>>>>>>
>>>>>>> Cheers,
>>>>>>> \EF
>>>>>>> --
>>>>>>> Erik Forsberg <[email protected]>
>>>>>>> Developer, Opera Software - http://www.opera.com/
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>
>>
>
>

Re: Hardware performance from HADOOP cluster

Reply via email to