[ https://issues.apache.org/jira/browse/TEZ-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15200446#comment-15200446 ]
Siddharth Seth commented on TEZ-3168: ------------------------------------- bq. Looking at the queue capacity could be very wrong in cases where the user limits only allow the user a tiny fraction of the queue. The Tez AM will think it has access to a lot more than it really does. Do you know if headroom factors in the user limits ? The additional options are definitely better. One of the main problems rightnow is that on a busy cluster, an app may end up thinking it has very little capacity available, thus generating large splits. Even if a job were to complete - the additional capacity will not be used. We've seen scenarios where it's better to kill and restart such jobs so that they take up additional capacity. Queue capacity, in that respect, would be consistent and allow for capacity utilization. However, it has the downside of a large number of waves. Comments on the patch - getAdditionalTokens - not used yet. Assuming this will be used in YarnClient at some point ? - Getting the RM delegation token, renewer etc. I don't think YARN has a public library to figure this out - that would be useful. In case of the YARN delegation token, I'm not sure why the API even exposes a renewer. This may need some changes to account for HA - that differs in the MapReduce getDelegationToken call. - tez.am.total.resource.calculator - rename to something like tez.am.total.resource.reporting.mechanism ? (calculator sounds like a plugin/class) - There's a mismatch in the default between the documentation (headroom) and constant (cluster) - getMaxAvailableResources - Is this being deleted ? Will be an incompatible change. If so, could you please defer it to a separate jira which can be committed just before the next 0.8.3 release. - RMDelegationTokenIdentifier.KIND_NAME - would this token end up being part of the dag credentials, after it is fetched by the AM ? - TaskSchedulerService - headroom= Resources.add(allocatedResources, getAvailableResources()) - this changes behaviour to some extent. However, I don't think it matters since the old code would set this value only once, before any containers have been allocated (allocatedResources = 0) - Resources are updated only once at startup, and on the first invocation of getResources. Given that InputInitializers within a DAG can run at different times, and multiple DAGs can run in AM - I think it's better to update these values more often. e.g. On a nodeReport change for cluster and queue resources. On dagComplete in general. Timed interval for queue and headroom. - We could move YarnClient creation into the shim itself - managing it's lifetime becomes problematic though. - Timeout missing on the new test. Also not sure what it's doing by checking the DEFAULT constant against all possible values. is that to future proof the test ? - Nit: Unused import in TezYarnClient - Nit: Avoid config lookup - TotalResourceCalculatorType.lookup(conf.get ... > Provide a more predictable approach for total resource guidance for > wave/split calculation > ------------------------------------------------------------------------------------------- > > Key: TEZ-3168 > URL: https://issues.apache.org/jira/browse/TEZ-3168 > Project: Apache Tez > Issue Type: Bug > Reporter: Hitesh Shah > Assignee: Hitesh Shah > Attachments: TEZ-3168.wip.patch > > > Currently, Tez uses headroom for checking total available resources. This is > flaky as it ends up causing the split count to be determined by a point in > time lookup at what is available in the cluster. A better approach would be > either the queue size or even cluster size to get a more predictable count. -- This message was sent by Atlassian JIRA (v6.3.4#6332)