Matt, You're right that it (slowstart) does not / would not affect much. I was merely explaining the reason behind his observance of reducers getting scheduled early, not really recommending a tweak for performance changes there.
On Tue, Jun 21, 2011 at 10:46 PM, GOEKE, MATTHEW (AG/1000) <[email protected]> wrote: > Harsh, > > Is it possible for mapred.reduce.slowstart.completed.maps to even play a > significant role in this? The only benefit he would find in tweaking that for > his problem would be to spread network traffic from the shuffle over a longer > period of time at a cost of having the reducer using resources earlier. > Either way he would see this effect across both sets of runs if he is using > the default parameters. I guess it would all depend on what kind of network > layout the cluster is on. > > Matt > > -----Original Message----- > From: Harsh J [mailto:[email protected]] > Sent: Tuesday, June 21, 2011 12:09 PM > To: [email protected] > Subject: Re: Poor scalability with map reduce application > > Alberto, > > On Tue, Jun 21, 2011 at 10:27 PM, Alberto Andreotti > <[email protected]> wrote: >> I don't know if speculatives maps are on, I'll check it. One thing I >> observed is that reduces begin before all maps have finished. Let me check >> also if the difference is on the map side or in the reduce. I believe it's >> balanced, both are slower when adding more nodes, but i'll confirm that. > > Maps and reduces are speculative by default, so must've been ON. Could > you also post a general input vs. output record counts and statistics > like that between your job runs, to correlate? > > The reducers get scheduled early but do not exactly "reduce()" until > all maps are done. They just keep fetching outputs. Their scheduling > can be controlled with some configurations (say, to start only after > X% of maps are done -- by default it starts up when 5% of maps are > done). > > -- > Harsh J > This e-mail message may contain privileged and/or confidential information, > and is intended to be received only by persons entitled > to receive such information. If you have received this e-mail in error, > please notify the sender immediately. Please delete it and > all attachments from any servers, hard drives or any other media. Other use > of this e-mail by you is strictly prohibited. > > All e-mails and attachments sent and received are subject to monitoring, > reading and archival by Monsanto, including its > subsidiaries. The recipient of this e-mail is solely responsible for checking > for the presence of "Viruses" or other "Malware". > Monsanto, along with its subsidiaries, accepts no liability for any damage > caused by any such code transmitted by or accompanying > this e-mail or any attachment. > > > The information contained in this email may be subject to the export control > laws and regulations of the United States, potentially > including but not limited to the Export Administration Regulations (EAR) and > sanctions regulations issued by the U.S. Department of > Treasury, Office of Foreign Asset Controls (OFAC). As a recipient of this > information you are obligated to comply with all > applicable U.S. export laws and regulations. > > -- Harsh J
