Yeah they are single proc machines and other than setting to 4 map/reduces, completely 0.20.1 vanilla installation.
I will tune it up in the morning based on what I can find on the web (e.g. cloudera guidelines) and post the results. I am going to be running HBase on top of this, but want to make sure the HDFS/MR is running sound before continuing. Seems there are a few people at the moment setting up clusters - might it be worth adding our config and results to http://wiki.apache.org/hadoop/HardwareBenchmarks ? For people like me (first cluster set up from scratch - previously used the EC2 scripts) it is nice to sanity check things look about right. The mailing lists suggest there are a few small clusters of medium spec machines springing up. Cheers, Tim On Thu, Oct 15, 2009 at 5:52 PM, Patrick Angeles <[email protected]> wrote: > Hi Tim, > I assume those are single proc machines? > > I got 649 secs on 70GB of data for our 7-node cluster (~11 mins), but we > have dual quad Nehalems (2.26Ghz). > > On Thu, Oct 15, 2009 at 11:34 AM, tim robertson > <[email protected]>wrote: > >> Hi Usmam, >> >> So on my 10 node cluster (9 DN) with 4 maps and 4 reduces (I plan on >> high memory jobs so picked 4 only) >> [9 DN of Dell R300: 2.83G Quadcore (2x6MB cache), 8G RAM and 2x500G SATA >> drives] >> >> Using your template for stats, I get the following with no tuning: >> >> GENERATE RANDOM DATA >> Wrote out 90GB of random binary data: >> Map output records=9198009 >> The job took 350 seconds. (approximately: 6 minutes) >> >> SORT RANDOM GENERATED DATA >> Map output records= 9197821 >> Reduce input records=9197821 >> The job took 2176 seconds. (approximately: 36mins). >> >> So pretty similar to your initial benchmark. I will tune it a bit >> tomorrow and rerun. >> >> If you spent time tuning your cluster and it was successful, please >> can you share your config? >> >> Cheers, >> Tim >> >> >> >> >> >> On Thu, Oct 15, 2009 at 11:32 AM, Usman Waheed <[email protected]> wrote: >> > Hi Todd, >> > >> > Some changes have been applied to the cluster based on the documentation >> > (URL) you noted below, >> > like file descriptor settings and io.file.buffer.size. I will check out >> the >> > other settings which I haven't applied yet. >> > >> > My map/reduce slot settings from my hadoop-site.xml and >> hadoop-default.xml >> > on all nodes in the cluster. >> > >> > _*hadoop-site.xml >> > *_mapred.tasktracker.task.maximum = 2 >> > mapred.tasktracker.map.tasks.maximum = 8 >> > mapred.tasktracker.reduce.tasks.maximum = 8 >> > _* >> > hadoop-default.xml >> > *_mapred.map.tasks = 2 >> > mapred.reduce.tasks = 1 >> > >> > Thanks, >> > Usman >> > >> > >> >> This seems a bit slow for that setup (4-5 MB/sec/node sorting). Have >> >> you changed the configurations at all? There are some notes on this >> >> blog post that might help your performance a bit: >> >> >> >> >> >> >> http://www.cloudera.com/blog/2009/03/30/configuration-parameters-what-can-you-just-ignore/ >> >> >> >> How many map and reduce slots did you configure for the daemons? If >> >> you have Ganglia installed you can usually get a good idea of whether >> >> you're using your resources well by looking at the graphs while >> >> running a job like this sort. >> >> >> >> -Todd >> >> >> >> On Wed, Oct 14, 2009 at 4:04 AM, Usman Waheed <[email protected]> wrote: >> >> >> >>> >> >>> Here are the results i got from my 4 node cluster (correction i noted 5 >> >>> earlier). One of my nodes out of the 4 is a namenode+datanode both. >> >>> >> >>> GENERATE RANDOM DATA >> >>> Wrote out 40GB of random binary data: >> >>> Map output records=4088301 >> >>> The job took 358 seconds. (approximately: 6 minutes). >> >>> >> >>> SORT RANDOM GENERATED DATA >> >>> Map output records=4088301 >> >>> Reduce input records=4088301 >> >>> The job took 2136 seconds. (approximately: 35 minutes). >> >>> >> >>> VALIDATION OF SORTED DATA >> >>> The job took 183 seconds. >> >>> SUCCESS! Validated the MapReduce framework's 'sort' successfully. >> >>> >> >>> It would be interesting to see what performance numbers others with a >> >>> similar setup have obtained. >> >>> >> >>> Thanks, >> >>> Usman >> >>> >> >>> >> >>>> >> >>>> I am setting up a new cluster of 10 nodes of 2.83G Quadcore (2x6MB >> >>>> cache), 8G RAM and 2x500G drives, and will do the same soon. Got some >> >>>> issues though so it won't start up... >> >>>> >> >>>> Tim >> >>>> >> >>>> >> >>>> On Wed, Oct 14, 2009 at 11:36 AM, Usman Waheed <[email protected]> >> wrote: >> >>>> >> >>>> >> >>>>> >> >>>>> Thanks Tim, i will check it out and post my results for comments. >> >>>>> -Usman >> >>>>> >> >>>>> >> >>>>>> >> >>>>>> Might it be worth running the http://wiki.apache.org/hadoop/Sortand >> >>>>>> posting your results for comment? >> >>>>>> >> >>>>>> Tim >> >>>>>> >> >>>>>> >> >>>>>> On Wed, Oct 14, 2009 at 10:48 AM, Usman Waheed <[email protected]> >> >>>>>> wrote: >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>>> >> >>>>>>> Hi, >> >>>>>>> >> >>>>>>> Is there a way to tell what kind of performance numbers one can >> >>>>>>> expect >> >>>>>>> out >> >>>>>>> of their cluster given a certain set of specs. >> >>>>>>> >> >>>>>>> For example i have 5 nodes in my cluster that all have the >> following >> >>>>>>> hardware configuration(s): >> >>>>>>> Quad Core 2.0GHz, 8GB RAM, 4x1TB disks and are all on the same >> rack. >> >>>>>>> >> >>>>>>> Thanks, >> >>>>>>> Usman >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>> >> >>>>>> >> >>>>> >> >>>>> >> >>>> >> >>>> >> >>> >> >>> >> >> >> >> >> > >> > >> >
