[
https://issues.apache.org/jira/browse/TEZ-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15200446#comment-15200446
]
Siddharth Seth commented on TEZ-3168:
-------------------------------------
bq. Looking at the queue capacity could be very wrong in cases where the user
limits only allow the user a tiny fraction of the queue. The Tez AM will think
it has access to a lot more than it really does.
Do you know if headroom factors in the user limits ?
The additional options are definitely better. One of the main problems rightnow
is that on a busy cluster, an app may end up thinking it has very little
capacity available, thus generating large splits. Even if a job were to
complete - the additional capacity will not be used. We've seen scenarios where
it's better to kill and restart such jobs so that they take up additional
capacity. Queue capacity, in that respect, would be consistent and allow for
capacity utilization. However, it has the downside of a large number of waves.
Comments on the patch
- getAdditionalTokens - not used yet. Assuming this will be used in YarnClient
at some point ?
- Getting the RM delegation token, renewer etc. I don't think YARN has a public
library to figure this out - that would be useful. In case of the YARN
delegation token, I'm not sure why the API even exposes a renewer. This may
need some changes to account for HA - that differs in the MapReduce
getDelegationToken call.
- tez.am.total.resource.calculator - rename to something like
tez.am.total.resource.reporting.mechanism ? (calculator sounds like a
plugin/class)
- There's a mismatch in the default between the documentation (headroom) and
constant (cluster)
- getMaxAvailableResources - Is this being deleted ? Will be an incompatible
change. If so, could you please defer it to a separate jira which can be
committed just before the next 0.8.3 release.
- RMDelegationTokenIdentifier.KIND_NAME - would this token end up being part of
the dag credentials, after it is fetched by the AM ?
- TaskSchedulerService
- headroom= Resources.add(allocatedResources, getAvailableResources()) -
this changes behaviour to some extent. However, I don't think it matters since
the old code would set this value only once, before any containers have been
allocated (allocatedResources = 0)
- Resources are updated only once at startup, and on the first invocation of
getResources. Given that InputInitializers within a DAG can run at different
times, and multiple DAGs can run in AM - I think it's better to update these
values more often. e.g. On a nodeReport change for cluster and queue resources.
On dagComplete in general. Timed interval for queue and headroom.
- We could move YarnClient creation into the shim itself - managing it's
lifetime becomes problematic though.
- Timeout missing on the new test. Also not sure what it's doing by checking
the DEFAULT constant against all possible values. is that to future proof the
test ?
- Nit: Unused import in TezYarnClient
- Nit: Avoid config lookup - TotalResourceCalculatorType.lookup(conf.get ...
> Provide a more predictable approach for total resource guidance for
> wave/split calculation
> -------------------------------------------------------------------------------------------
>
> Key: TEZ-3168
> URL: https://issues.apache.org/jira/browse/TEZ-3168
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Hitesh Shah
> Assignee: Hitesh Shah
> Attachments: TEZ-3168.wip.patch
>
>
> Currently, Tez uses headroom for checking total available resources. This is
> flaky as it ends up causing the split count to be determined by a point in
> time lookup at what is available in the cluster. A better approach would be
> either the queue size or even cluster size to get a more predictable count.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)