Yes adding in more resources in the scheduling request would be the ideal 
solution to the problem.  But sadly that is not a trivial change.  The initial 
solution I suggested is an ugly hack, and will not work for the cases you have 
suggested.  If you feel that this is important work please feel free to file a 
JIRA for this.  We can continue discussion on that JIRA about  the details of 
how to add in this type of functionality.  I am very interested in the 
scheduler and would be happy to help out, but sadly my time right now is very 
limited.

--Bobby Evans

On 5/10/12 6:56 AM, "Radim Kolar" <h...@filez.com> wrote:



> We've been against these 'features' since it leads to very bad
> behaviour across the cluster with multiple apps/users etc.
Its not new feature, its extension of existing resource scheduling which
works good enough only for RAM. There are 2 other resources - CPU cores
and network IO which needs to be considered.

We have job which is doing lot of network IO in mapper and its desirable
to run mappers on different nodes even if reading blocks from HDFS will
not be local.

Our second job is burning all CPU cores on machine while doing
computations, its important for mappers not to land on same node.

Reply via email to