Thank you guys, I really appreciate your answers. I don't have access to the cluster right now, I'll check the info you are asking and come back in a couple of hours. BTW, I tried the app on two clusters with similar results. I'm using 0.21.0.
thanks again, Alberto. On 21 June 2011 14:16, GOEKE, MATTHEW (AG/1000) <[email protected]>wrote: > Harsh, > > Is it possible for mapred.reduce.slowstart.completed.maps to even play a > significant role in this? The only benefit he would find in tweaking that > for his problem would be to spread network traffic from the shuffle over a > longer period of time at a cost of having the reducer using resources > earlier. Either way he would see this effect across both sets of runs if he > is using the default parameters. I guess it would all depend on what kind of > network layout the cluster is on. > > Matt > > -----Original Message----- > From: Harsh J [mailto:[email protected]] > Sent: Tuesday, June 21, 2011 12:09 PM > To: [email protected] > Subject: Re: Poor scalability with map reduce application > > Alberto, > > On Tue, Jun 21, 2011 at 10:27 PM, Alberto Andreotti > <[email protected]> wrote: > > I don't know if speculatives maps are on, I'll check it. One thing I > > observed is that reduces begin before all maps have finished. Let me > check > > also if the difference is on the map side or in the reduce. I believe > it's > > balanced, both are slower when adding more nodes, but i'll confirm that. > > Maps and reduces are speculative by default, so must've been ON. Could > you also post a general input vs. output record counts and statistics > like that between your job runs, to correlate? > > The reducers get scheduled early but do not exactly "reduce()" until > all maps are done. They just keep fetching outputs. Their scheduling > can be controlled with some configurations (say, to start only after > X% of maps are done -- by default it starts up when 5% of maps are > done). > > -- > Harsh J > This e-mail message may contain privileged and/or confidential information, > and is intended to be received only by persons entitled > to receive such information. If you have received this e-mail in error, > please notify the sender immediately. Please delete it and > all attachments from any servers, hard drives or any other media. Other use > of this e-mail by you is strictly prohibited. > > All e-mails and attachments sent and received are subject to monitoring, > reading and archival by Monsanto, including its > subsidiaries. The recipient of this e-mail is solely responsible for > checking for the presence of "Viruses" or other "Malware". > Monsanto, along with its subsidiaries, accepts no liability for any damage > caused by any such code transmitted by or accompanying > this e-mail or any attachment. > > > The information contained in this email may be subject to the export > control laws and regulations of the United States, potentially > including but not limited to the Export Administration Regulations (EAR) > and sanctions regulations issued by the U.S. Department of > Treasury, Office of Foreign Asset Controls (OFAC). As a recipient of this > information you are obligated to comply with all > applicable U.S. export laws and regulations. > > -- José Pablo Alberto Andreotti. Tel: 54 351 4730292 Móvil: 54351156526363. MSN: [email protected] Skype: andreottialberto
