Shannon Carey created FLINK-5542:
------------------------------------

             Summary: YARN client incorrectly uses local YARN config to check 
vcore capacity
                 Key: FLINK-5542
                 URL: https://issues.apache.org/jira/browse/FLINK-5542
             Project: Flink
          Issue Type: Bug
          Components: YARN
    Affects Versions: 1.1.4
            Reporter: Shannon Carey


See 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/1-1-4-on-YARN-vcores-change-td11016.html

When using bin/yarn-session.sh, AbstractYarnClusterDescriptor line 271 in 1.1.4 
is comparing the user's selected number of vcores to the vcores configured in 
the local node's YARN config (from YarnConfiguration eg. yarn-site.xml and 
yarn-default.xml). It incorrectly prevents Flink from launching even if there 
is sufficient vcore capacity on the cluster.

That is not correct, because the application will not necessarily run on the 
local node. For example, if running the yarn-session.sh client from the AWS EMR 
master node, the vcore count there may be different from the vcore count on the 
core nodes where Flink will actually run.

A reasonable way to fix this would probably be to reuse the logic from 
"yarn-session.sh -q" (FlinkYarnSessionCli line 550) which knows how to get 
vcore information from the real worker nodes.  Alternatively, perhaps we could 
remove the check entirely and rely on YARN's Scheduler to determine whether 
sufficient resources exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to