For terasort you want to fill up your entire cluster with maps/reduces as fast 
as you can to get the best performance.

Just play with #slots.

Arun

On May 9, 2012, at 12:36 PM, Jeffrey Buell wrote:

> Not to speak for Radim, but what I’m trying to achieve is performance at 
> least as good as 0.20 for all cases.  That is, no regressions.  For something 
> as simple as terasort, I don’t think that is possible without being able to 
> specify the max number of mappers/reducers per node.  As it is, I see 
> slowdowns as much as 2X.  Hopefully I’m wrong and somebody will straighten me 
> out.  But if I’m not, adding such a feature won’t lead to bad behavior of any 
> kind since the default could be set to unlimited and thus have no effect 
> whatsoever.
>  
> I should emphasize that I support the goal of greater automation since Hadoop 
> has way too many parameters and is so hard to tune.  Just not at the expense 
> of performance regressions. 
>  
> Jeff
>  
>  
> We've been against these 'features' since it leads to very bad behaviour 
> across the cluster with multiple apps/users etc.
>  
> What is your use-case i.e. what are you trying to achieve with this?
>  
> thanks,
> Arun
>  
> On May 3, 2012, at 5:59 AM, Radim Kolar wrote:
> 
> 
> if plugin system for AM is overkill, something simpler can be made like:
> 
> maximum number of mappers per node
> maximum number of reducers per node
> 
> maximum percentage of non data local tasks
> maximum percentage of rack local tasks
> 
> and set this in job properties.
>  
>  

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/


Reply via email to