[
https://issues.apache.org/jira/browse/FLINK-14968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16983349#comment-16983349
]
Yang Wang commented on FLINK-14968:
-----------------------------------
[~gjy] [~pnowojski] [~aljoscha]
I think the flink job should not fail with not enough slots. We start 3
TaskManager with 1 slot for each, so each slot could run a complete pipeline. I
have tested on a real yarn cluster, it always works as expected. Only 3
TaskManagers are started and the job finished successfully.
{code:java}
./bin/flink run -d -p 3 -m yarn-cluster examples/streaming/WordCount.jar
--input dummy://localhost/words --input anotherDummy://localhost/words
{code}
I have gone over the logs and find that the `SlotPoolImpl` allocates 4 slot. It
should be only 3. May some bugs happen in Scheduler internally.
> Kerberized YARN on Docker test (custom fs plugin) fails on Travis
> -----------------------------------------------------------------
>
> Key: FLINK-14968
> URL: https://issues.apache.org/jira/browse/FLINK-14968
> Project: Flink
> Issue Type: Bug
> Components: FileSystems, Tests
> Affects Versions: 1.10.0
> Reporter: Gary Yao
> Priority: Blocker
> Labels: test-stability
> Fix For: 1.10.0
>
>
> This change made the test flaky:
> https://github.com/apache/flink/commit/749965348170e4608ff2a23c9617f67b8c341df5.
> It changes the job to have two sources instead of one which, under normal
> circumstances, requires too many slots to run and therefore the job will fail.
> The setup of this test is very intricate, we configure YARN to have two
> NodeManagers with 2500mb memory each:
> https://github.com/apache/flink/blob/413a77157caf25dbbfb8b0caaf2c9e12c7374d98/flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/config/yarn-site.xml#L39.
> We run the job with parallelism 3 and configure Flink to use 1000mb as
> TaskManager memory and 1000mb of JobManager memory. This means that the job
> fits into the YARN memory budget but more TaskManagers would not fit. We also
> don't simply increase the YARN resources because we want the Flink job to use
> TMs on different NMs because we had a bug where Kerberos config file shipping
> was not working correctly but the bug was not materialising if all TMs where
> on the same NM.
> https://api.travis-ci.org/v3/job/612782888/log.txt
--
This message was sent by Atlassian Jira
(v8.3.4#803005)