[ 
https://issues.apache.org/jira/browse/TEZ-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15200446#comment-15200446
 ] 

Siddharth Seth commented on TEZ-3168:
-------------------------------------

bq. Looking at the queue capacity could be very wrong in cases where the user 
limits only allow the user a tiny fraction of the queue. The Tez AM will think 
it has access to a lot more than it really does.
Do you know if headroom factors in the user limits ?

The additional options are definitely better. One of the main problems rightnow 
is that on a busy cluster, an app may end up thinking it has very little 
capacity available, thus generating large splits. Even if a job were to 
complete - the additional capacity will not be used. We've seen scenarios where 
it's better to kill and restart such jobs so that they take up additional 
capacity. Queue capacity, in that respect, would be consistent and allow for 
capacity utilization. However, it has the downside of a large number of waves.


Comments on the patch
- getAdditionalTokens - not used yet. Assuming this will be used in YarnClient 
at some point ?
- Getting the RM delegation token, renewer etc. I don't think YARN has a public 
library to figure this out - that would be useful. In case of the YARN 
delegation token, I'm not sure why the API even exposes a renewer. This may 
need some changes to account for HA - that differs in the MapReduce 
getDelegationToken call.
- tez.am.total.resource.calculator - rename to something like 
tez.am.total.resource.reporting.mechanism ? (calculator sounds like a 
plugin/class)
- There's a mismatch in the default between the documentation (headroom) and 
constant (cluster)
- getMaxAvailableResources - Is this being deleted ? Will be an incompatible 
change. If so, could you please defer it to a separate jira which can be 
committed just before the next 0.8.3 release.
- RMDelegationTokenIdentifier.KIND_NAME - would this token end up being part of 
the dag credentials, after it is fetched by the AM ?
- TaskSchedulerService
   - headroom= Resources.add(allocatedResources, getAvailableResources()) - 
this changes behaviour to some extent. However, I don't think it matters since 
the old code would set this value only once, before any containers have been 
allocated (allocatedResources = 0)
   - Resources are updated only once at startup, and on the first invocation of 
getResources. Given that InputInitializers within a DAG can run at different 
times, and multiple DAGs can run in AM - I think it's better to update these 
values more often. e.g. On a nodeReport change for cluster and queue resources. 
On dagComplete in general. Timed interval for queue and headroom.
- We could move YarnClient creation into the shim itself - managing it's 
lifetime becomes problematic though.
- Timeout missing on the new test. Also not sure what it's doing by checking 
the DEFAULT constant against all possible values. is that to future proof the 
test ?
- Nit: Unused import in TezYarnClient
- Nit: Avoid config lookup - TotalResourceCalculatorType.lookup(conf.get ... 


> Provide a more predictable approach for total resource guidance for 
> wave/split calculation 
> -------------------------------------------------------------------------------------------
>
>                 Key: TEZ-3168
>                 URL: https://issues.apache.org/jira/browse/TEZ-3168
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Hitesh Shah
>            Assignee: Hitesh Shah
>         Attachments: TEZ-3168.wip.patch
>
>
> Currently, Tez uses headroom for checking total available resources. This is 
> flaky as it ends up causing the split count to be determined by a point in 
> time lookup at what is available in the cluster. A better approach would be 
> either the queue size or even cluster size to get a more predictable count. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to