Alberto, I can assure you that fiddling with default replication factors can't be the solution here. Most of us running a 3+ cluster still use the 3-replica-factor and it hardly introduces a performance lag. As long as your Hadoop cluster network is not shared with other network applications, you shouldn't be seeing any network slowdowns.
Anyhow, the dfs.replication.max is not what you were looking to change. It was dfs.replication instead (to affect all new file replication values). AFAIK, there is no replication factor hardcoded anywhere in code, its all configurable, so its just a matter of setting the right configuration :) Regarding the "10" thing: The MR components try to load their jars and other submitted code/files with a 10 replication factor by default, so that it propagates to all racks/etc and leads to a fast startup of tasks. I do not think that's a problem either in your case (if it gets 4, it will use 4, if it gets 7, it will use 7 -- but won't take too long). On Thu, Jun 23, 2011 at 6:14 AM, Alberto Andreotti <[email protected]> wrote: > Hi guys, > > I suspected that the problem was due to overhead introduced by the > filesystem, so I tried to set the "dfs.replication.max" property to > different values. > First, I tried with 2, and I got a message saying that I was requesting a > value of 3, which was bigger than the limit. So I couldn't do the run(it > seems this 3 is hardcoded somewhere, I read that in Jira). > Then I tried with 3, I could generate the input files for the map reduce > app, but when trying to run I got this one, > > Exception in thread "main" java.io.IOException: file > /tmp/hadoop-aandre/mapred/staging/aandre/.staging/job_201106230004_0003/job.jar. > Requested replication 10 exceeds maximum 3 > at > org.apache.hadoop.hdfs.server.namenode.BlockManager.verifyReplication(BlockManager.java:468) > > > which seems like the framework were trying to replicate the output in as > many nodes as possible. Could this be the degradation source?. > Also I attached the log for the run with 7 nodes,. > > Alberto. > > > On 21 June 2011 14:40, Harsh J <[email protected]> wrote: >> >> Matt, >> >> You're right that it (slowstart) does not / would not affect much. I >> was merely explaining the reason behind his observance of reducers >> getting scheduled early, not really recommending a tweak for >> performance changes there. >> >> On Tue, Jun 21, 2011 at 10:46 PM, GOEKE, MATTHEW (AG/1000) >> <[email protected]> wrote: >> > Harsh, >> > >> > Is it possible for mapred.reduce.slowstart.completed.maps to even play a >> > significant role in this? The only benefit he would find in tweaking that >> > for his problem would be to spread network traffic from the shuffle over a >> > longer period of time at a cost of having the reducer using resources >> > earlier. Either way he would see this effect across both sets of runs if he >> > is using the default parameters. I guess it would all depend on what kind >> > of >> > network layout the cluster is on. >> > >> > Matt >> > >> > -----Original Message----- >> > From: Harsh J [mailto:[email protected]] >> > Sent: Tuesday, June 21, 2011 12:09 PM >> > To: [email protected] >> > Subject: Re: Poor scalability with map reduce application >> > >> > Alberto, >> > >> > On Tue, Jun 21, 2011 at 10:27 PM, Alberto Andreotti >> > <[email protected]> wrote: >> >> I don't know if speculatives maps are on, I'll check it. One thing I >> >> observed is that reduces begin before all maps have finished. Let me >> >> check >> >> also if the difference is on the map side or in the reduce. I believe >> >> it's >> >> balanced, both are slower when adding more nodes, but i'll confirm >> >> that. >> > >> > Maps and reduces are speculative by default, so must've been ON. Could >> > you also post a general input vs. output record counts and statistics >> > like that between your job runs, to correlate? >> > >> > The reducers get scheduled early but do not exactly "reduce()" until >> > all maps are done. They just keep fetching outputs. Their scheduling >> > can be controlled with some configurations (say, to start only after >> > X% of maps are done -- by default it starts up when 5% of maps are >> > done). >> > >> > -- >> > Harsh J >> > This e-mail message may contain privileged and/or confidential >> > information, and is intended to be received only by persons entitled >> > to receive such information. If you have received this e-mail in error, >> > please notify the sender immediately. Please delete it and >> > all attachments from any servers, hard drives or any other media. Other >> > use of this e-mail by you is strictly prohibited. >> > >> > All e-mails and attachments sent and received are subject to monitoring, >> > reading and archival by Monsanto, including its >> > subsidiaries. The recipient of this e-mail is solely responsible for >> > checking for the presence of "Viruses" or other "Malware". >> > Monsanto, along with its subsidiaries, accepts no liability for any >> > damage caused by any such code transmitted by or accompanying >> > this e-mail or any attachment. >> > >> > >> > The information contained in this email may be subject to the export >> > control laws and regulations of the United States, potentially >> > including but not limited to the Export Administration Regulations (EAR) >> > and sanctions regulations issued by the U.S. Department of >> > Treasury, Office of Foreign Asset Controls (OFAC). As a recipient of >> > this information you are obligated to comply with all >> > applicable U.S. export laws and regulations. >> > >> > >> >> >> >> -- >> Harsh J > > > > -- > José Pablo Alberto Andreotti. > Tel: 54 351 4730292 > Móvil: 54351156526363. > MSN: [email protected] > Skype: andreottialberto > -- Harsh J
