Have you tested the performance of adjusting mapred.reduce.slowstart.completed.maps property? I'm curious as to what effect you have seen by dropping it from the default to .01 because my original assumption would have been to try something much higher so that you don't have threads spawning so soon for sort and shuffle. Also what kind of network interfaces does each of these machines have and how is the "rack" setup?
Matt -----Original Message----- From: baran cakici [mailto:[email protected]] Sent: Monday, May 02, 2011 10:30 AM To: [email protected] Subject: Re: Configuration for small Cluster I got it, I want to run on each Tasktracker one ReduceTask, overall 4 Redeuce Task on all Cluster 2011/5/2 baran cakici <[email protected]> > Actually it was one, I changed that, and got better Performance by Reduce, > because my Reduce-Algortihm is a little bit complex. > > thanks anyway > > Regards, > > Baran > > 2011/5/2 Richard Nadeau <[email protected]> > >> I would change "mapred.tasktracker.reduce.tasks.maximum" to one. With your >> setting >> >> On May 2, 2011 8:48 AM, "baran cakici" <[email protected]> wrote: >> > without job; >> > >> > CPU Usage = 0% >> > Memory = 585 MB (2GB Ram) >> > >> > Baran >> > 2011/5/2 baran cakici <[email protected]> >> > >> >> CPU Usage = 95-100% >> >> Memory = 650-850 MB (2GB Ram) >> >> >> >> Baran >> >> >> >> >> >> 2011/5/2 James Seigel <[email protected]> >> >> >> >>> If you have windows and cygwin you probably don't have a lot if memory >> >>> left at 2 gig. >> >>> >> >>> Pull up system monitor on the data nodes and check for free memory >> >>> when you have you jobs running. I bet it is quite low. >> >>> >> >>> I am not a windows guy so I can't take you much farther. >> >>> >> >>> James >> >>> >> >>> Sent from my mobile. Please excuse the typos. >> >>> >> >>> On 2011-05-02, at 8:32 AM, baran cakici <[email protected]> >> wrote: >> >>> >> >>> > yes, I am running under cygwin on my datanodes too. OS of Datanodes >> are >> >>> > Windows as well. >> >>> > >> >>> > What can I do exactly for a better Performance. I changed >> >>> > mapred.child.java.opts to default value.How can I solve this >> "swapping" >> >>> > problem? >> >>> > >> >>> > PS: I dont have a chance to get Slaves(Celeron 2GHz) with Liniux OS. >> >>> > >> >>> > thanks, both of you >> >>> > >> >>> > Regards, >> >>> > >> >>> > Baran >> >>> > 2011/5/2 Richard Nadeau <[email protected]> >> >>> > >> >>> >> Are you running under cygwin on your data nodes as well? That is >> >>> certain to >> >>> >> cause performance problems. As James suggested, swapping to disk is >> >>> going >> >>> >> to >> >>> >> be a killer, running on Windows with Celeron processors only >> compounds >> >>> the >> >>> >> problem. The Celeron processor is also sub-optimal for CPU >> intensive >> >>> tasks >> >>> >> >> >>> >> Rick >> >>> >> >> >>> >> On Apr 28, 2011 9:22 AM, "baran cakici" <[email protected]> >> wrote: >> >>> >>> Hi Everyone, >> >>> >>> >> >>> >>> I have a Cluster with one Master(JobTracker and NameNode - Intel >> >>> Core2Duo >> >>> >> 2 >> >>> >>> GB Ram) and four Slaves(Datanode and Tasktracker - Celeron 2 GB >> Ram). >> >>> My >> >>> >>> Inputdata are between 2GB-10GB and I read Inputdata in MapReduce >> line >> >>> by >> >>> >>> line. Now, I try to accelerate my System(Benchmark), but I'm not >> sure, >> >>> if >> >>> >> my >> >>> >>> Configuration is correctly. Can you please just look, if it is ok? >> >>> >>> >> >>> >>> -mapred-site.xml >> >>> >>> >> >>> >>> <property> >> >>> >>> <name>mapred.job.tracker</name> >> >>> >>> <value>apple:9001</value> >> >>> >>> </property> >> >>> >>> >> >>> >>> <property> >> >>> >>> <name>mapred.child.java.opts</name> >> >>> >>> <value>-Xmx512m -server</value> >> >>> >>> </property> >> >>> >>> >> >>> >>> <property> >> >>> >>> <name>mapred.job.tracker.handler.count</name> >> >>> >>> <value>2</value> >> >>> >>> </property> >> >>> >>> >> >>> >>> <property> >> >>> >>> <name>mapred.local.dir</name> >> >>> >>> >> >>> >> >> >>> >> >> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/local</value> >> >>> >>> </property> >> >>> >>> >> >>> >>> <property> >> >>> >>> <name>mapred.map.tasks</name> >> >>> >>> <value>1</value> >> >>> >>> </property> >> >>> >>> >> >>> >>> <property> >> >>> >>> <name>mapred.reduce.tasks</name> >> >>> >>> <value>4</value> >> >>> >>> </property> >> >>> >>> >> >>> >>> <property> >> >>> >>> <name>mapred.submit.replication</name> >> >>> >>> <value>2</value> >> >>> >>> </property> >> >>> >>> >> >>> >>> <property> >> >>> >>> <name>mapred.system.dir</name> >> >>> >>> >> >>> >> >> >>> >> >> >>> >> >> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/system</value> >> >>> >>> </property> >> >>> >>> >> >>> >>> <property> >> >>> >>> <name>mapred.tasktracker.indexcache.mb</name> >> >>> >>> <value>10</value> >> >>> >>> </property> >> >>> >>> >> >>> >>> <property> >> >>> >>> <name>mapred.tasktracker.map.tasks.maximum</name> >> >>> >>> <value>1</value> >> >>> >>> </property> >> >>> >>> >> >>> >>> <property> >> >>> >>> <name>mapred.tasktracker.reduce.tasks.maximum</name> >> >>> >>> <value>4</value> >> >>> >>> </property> >> >>> >>> >> >>> >>> <property> >> >>> >>> <name>mapred.temp.dir</name> >> >>> >>> >> >>> >> >> >>> >> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/temp</value> >> >>> >>> </property> >> >>> >>> >> >>> >>> <property> >> >>> >>> <name>webinterface.private.actions</name> >> >>> >>> <value>true</value> >> >>> >>> </property> >> >>> >>> >> >>> >>> <property> >> >>> >>> <name>mapred.reduce.slowstart.completed.maps</name> >> >>> >>> <value>0.01</value> >> >>> >>> </property> >> >>> >>> >> >>> >>> -hdfs-site.xml >> >>> >>> >> >>> >>> <property> >> >>> >>> <name>dfs.block.size</name> >> >>> >>> <value>268435456</value> >> >>> >>> </property> >> >>> >>> PS: I extended dfs.block.size, because I won 50% better >> performance >> >>> with >> >>> >>> this change. >> >>> >>> >> >>> >>> I am waiting for your comments... >> >>> >>> >> >>> >>> Regards, >> >>> >>> >> >>> >>> Baran >> >>> >> >> >>> >> >> >> >> >> > > This e-mail message may contain privileged and/or confidential information, and is intended to be received only by persons entitled to receive such information. If you have received this e-mail in error, please notify the sender immediately. Please delete it and all attachments from any servers, hard drives or any other media. Other use of this e-mail by you is strictly prohibited. All e-mails and attachments sent and received are subject to monitoring, reading and archival by Monsanto, including its subsidiaries. The recipient of this e-mail is solely responsible for checking for the presence of "Viruses" or other "Malware". Monsanto, along with its subsidiaries, accepts no liability for any damage caused by any such code transmitted by or accompanying this e-mail or any attachment. The information contained in this email may be subject to the export control laws and regulations of the United States, potentially including but not limited to the Export Administration Regulations (EAR) and sanctions regulations issued by the U.S. Department of Treasury, Office of Foreign Asset Controls (OFAC). As a recipient of this information you are obligated to comply with all applicable U.S. export laws and regulations.
