[ 
https://issues.apache.org/jira/browse/TEZ-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15201554#comment-15201554
 ] 

Jason Lowe commented on TEZ-3168:
---------------------------------

Forgot to mention that node labels could also mess with this in a significant 
way -- reported capacity of a queue may be completely misrepresented when node 
labels are partitioning the cluster and restricting what the app can access in 
a particular queue.

I can see the case where we want to over-split the data if we can run far more 
tasks in parallel than splits.  With container reuse to mitigate a large 
portion of the per-task overhead, it should be better to overestimate rather 
than underestimate what we're capable of running.  However if we excessively 
overestimate what we can run simultaneously it will amplify the per-task 
overhead.  We already see many detrimental effects of over partitioning due to 
the pressure it puts on the AM to manage that many tasks and events and the 
extra overhead in the shuffle and merge for each task.  Underestimating the 
number of splits is definitely a concern, but excessive overestimating could be 
really bad as well, ultimately destroying the AM if its heap can't accommodate.

I suppose it's no worse than today given it keeps the same behavior by default. 
 It's a user- or admin-driven decision to choose a different scheduling 
heuristic, and they would need to be aware of the cluster setup assumptions 
those heuristics are making.

> Provide a more predictable approach for total resource guidance for 
> wave/split calculation 
> -------------------------------------------------------------------------------------------
>
>                 Key: TEZ-3168
>                 URL: https://issues.apache.org/jira/browse/TEZ-3168
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Hitesh Shah
>            Assignee: Hitesh Shah
>         Attachments: TEZ-3168.wip.2.patch, TEZ-3168.wip.patch
>
>
> Currently, Tez uses headroom for checking total available resources. This is 
> flaky as it ends up causing the split count to be determined by a point in 
> time lookup at what is available in the cluster. A better approach would be 
> either the queue size or even cluster size to get a more predictable count. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to