[
https://issues.apache.org/jira/browse/FLINK-5542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641087#comment-16641087
]
ASF GitHub Bot commented on FLINK-5542:
---------------------------------------
leanken commented on issue #6775: [FLINK-5542] use YarnCluster vcores setting
to do MaxVCore validation
URL: https://github.com/apache/flink/pull/6775#issuecomment-427659457
Thanks for your review. @tillrohrmann
Resolved your comment. Looking forward for further work at FLINK community.
About the proposal you mentioned on having a exhaustive check on job
submission, I will try go through more submit scenarios and see if what we can
do to achieve the exhaustive check goal and create a new follow up issue,
thanks.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> YARN client incorrectly uses local YARN config to check vcore capacity
> ----------------------------------------------------------------------
>
> Key: FLINK-5542
> URL: https://issues.apache.org/jira/browse/FLINK-5542
> Project: Flink
> Issue Type: Bug
> Components: YARN
> Affects Versions: 1.1.4, 1.5.3, 1.6.0, 1.7.0
> Reporter: Shannon Carey
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.7.0
>
>
> See
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/1-1-4-on-YARN-vcores-change-td11016.html
> When using bin/yarn-session.sh, AbstractYarnClusterDescriptor line 271 in
> 1.1.4 is comparing the user's selected number of vcores to the vcores
> configured in the local node's YARN config (from YarnConfiguration eg.
> yarn-site.xml and yarn-default.xml). It incorrectly prevents Flink from
> launching even if there is sufficient vcore capacity on the cluster.
> That is not correct, because the application will not necessarily run on the
> local node. For example, if running the yarn-session.sh client from the AWS
> EMR master node, the vcore count there may be different from the vcore count
> on the core nodes where Flink will actually run.
> A reasonable way to fix this would probably be to reuse the logic from
> "yarn-session.sh -q" (FlinkYarnSessionCli line 550) which knows how to get
> vcore information from the real worker nodes. Alternatively, perhaps we
> could remove the check entirely and rely on YARN's Scheduler to determine
> whether sufficient resources exist.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)