For terasort you want to fill up your entire cluster with maps/reduces as fast as you can to get the best performance.
Just play with #slots. Arun On May 9, 2012, at 12:36 PM, Jeffrey Buell wrote: > Not to speak for Radim, but what I’m trying to achieve is performance at > least as good as 0.20 for all cases. That is, no regressions. For something > as simple as terasort, I don’t think that is possible without being able to > specify the max number of mappers/reducers per node. As it is, I see > slowdowns as much as 2X. Hopefully I’m wrong and somebody will straighten me > out. But if I’m not, adding such a feature won’t lead to bad behavior of any > kind since the default could be set to unlimited and thus have no effect > whatsoever. > > I should emphasize that I support the goal of greater automation since Hadoop > has way too many parameters and is so hard to tune. Just not at the expense > of performance regressions. > > Jeff > > > We've been against these 'features' since it leads to very bad behaviour > across the cluster with multiple apps/users etc. > > What is your use-case i.e. what are you trying to achieve with this? > > thanks, > Arun > > On May 3, 2012, at 5:59 AM, Radim Kolar wrote: > > > if plugin system for AM is overkill, something simpler can be made like: > > maximum number of mappers per node > maximum number of reducers per node > > maximum percentage of non data local tasks > maximum percentage of rack local tasks > > and set this in job properties. > > -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/