Ufuk Celebi created FLINK-3003:
----------------------------------
Summary: Add container allocation timeout to YARN CLI
Key: FLINK-3003
URL: https://issues.apache.org/jira/browse/FLINK-3003
Project: Flink
Issue Type: Improvement
Components: YARN Client
Affects Versions: 0.10
Reporter: Ufuk Celebi
Fix For: 1.0, 0.10.1
Programs submitted via {{bin/flink run -m yarn-cluster}} start a short-lived
YARN sessions before submitting the job. The job is only submitted when all
resources have been allocated. All allocated containers are "blocked" by the to
be submitted job and the cluster is only partially allocated.
If you have multiple submissions like this with partial allocations, you can
block the whole YARN cluster (e.g. 10 containers in total and two sessions want
6 containers each and both have allocated 5).
A simple work around for these situations is to add an allocation timeout after
which the YARN sessions fails and releases all the resources.
[Other strategies like wait for X amount of time for Y containers, but then go
with what you have if you don't get all are also possible.]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)