[
https://issues.apache.org/jira/browse/FLINK-4296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15402308#comment-15402308
]
ASF GitHub Bot commented on FLINK-4296:
---------------------------------------
GitHub user tillrohrmann opened a pull request:
https://github.com/apache/flink/pull/2321
[FLINK-4296] Fixes failure reporting of consumer task scheduling when
producer has already finished
This PR changes the failure behaviour such that the consumer task is failed
instead of the
producer task. The latter is problematic, since a finsihed producer task
will simply swallow
scheduling exception originating from scheduling the consumer task.
This PR should also be merged in the release-1.1.0 branch.
R @mxm.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/tillrohrmann/flink fixBatchScheduling
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/2321.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2321
----
commit 6eaddb1d562117124e17d244aca69cd591bc9c54
Author: Till Rohrmann <[email protected]>
Date: 2016-08-01T16:05:14Z
[FLINK-4296] Fixes failure reporting of consumer task scheduling when
producer has already finished
This PR changes the failure behaviour such that the consumer task is failed
instead of the
producer task. The latter is problematic, since a finsihed producer task
will simply swallow
scheduling exception originating from scheduling the consumer task.
----
> Scheduler accepts more tasks than it has task slots available
> -------------------------------------------------------------
>
> Key: FLINK-4296
> URL: https://issues.apache.org/jira/browse/FLINK-4296
> Project: Flink
> Issue Type: Bug
> Components: JobManager, TaskManager
> Affects Versions: 1.1.0
> Reporter: Maximilian Michels
> Assignee: Till Rohrmann
> Priority: Critical
> Fix For: 1.1.0, 1.2.0
>
>
> Flink's scheduler doesn't support queued scheduling but expects to find all
> necessary task slots upon scheduling. If it does not it throws an error. Due
> to some changes in the latest master, this seems to be broken.
> Flink accepts jobs with {{parallelism > total number of task slots}},
> schedules and deploys tasks in all available task slots, and leaves the
> remaining tasks lingering forever.
> Easy to reproduce:
> {code}
> ./bin/flink run -p TASK_SLOTS+n
> {code}
> where {{TASK_SLOTS}} is the number of total task slots of the cluster and
> {{n>=1}}.
> Here, {{p=11}}, {{TASK_SLOTS=10}}:
> {{bin/flink run -p 11 examples/batch/EnumTriangles.jar}}
> {noformat}
> Cluster configuration: Standalone cluster with JobManager at
> localhost/127.0.0.1:6123
> Using address localhost:6123 to connect to JobManager.
> JobManager web interface address http://localhost:8081
> Starting execution of program
> Executing EnumTriangles example with default edges data set.
> Use --edges to specify file input.
> Printing result to stdout. Use --output to specify output path.
> Submitting job with JobID: cd0c0b4cbe25643d8d92558168cfc045. Waiting for job
> completion.
> 08/01/2016 12:12:12 Job execution switched to status RUNNING.
> 08/01/2016 12:12:12 CHAIN DataSource (at
> getDefaultEdgeDataSet(EnumTrianglesData.java:57)
> (org.apache.flink.api.java.io.CollectionInputFormat)) -> Map (Map at
> main(EnumTriangles.java:108))(1/1) switched to SCHEDULED
> 08/01/2016 12:12:12 CHAIN DataSource (at
> getDefaultEdgeDataSet(EnumTrianglesData.java:57)
> (org.apache.flink.api.java.io.CollectionInputFormat)) -> Map (Map at
> main(EnumTriangles.java:108))(1/1) switched to DEPLOYING
> 08/01/2016 12:12:12 CHAIN DataSource (at
> getDefaultEdgeDataSet(EnumTrianglesData.java:57)
> (org.apache.flink.api.java.io.CollectionInputFormat)) -> Map (Map at
> main(EnumTriangles.java:108))(1/1) switched to RUNNING
> 08/01/2016 12:12:12 CHAIN DataSource (at
> getDefaultEdgeDataSet(EnumTrianglesData.java:57)
> (org.apache.flink.api.java.io.CollectionInputFormat)) -> Map (Map at
> main(EnumTriangles.java:108))(1/1) switched to FINISHED
> 08/01/2016 12:12:12 GroupReduce (GroupReduce at
> main(EnumTriangles.java:112))(1/11) switched to SCHEDULED
> 08/01/2016 12:12:12 GroupReduce (GroupReduce at
> main(EnumTriangles.java:112))(3/11) switched to SCHEDULED
> 08/01/2016 12:12:12 GroupReduce (GroupReduce at
> main(EnumTriangles.java:112))(2/11) switched to SCHEDULED
> 08/01/2016 12:12:12 GroupReduce (GroupReduce at
> main(EnumTriangles.java:112))(7/11) switched to SCHEDULED
> 08/01/2016 12:12:12 GroupReduce (GroupReduce at
> main(EnumTriangles.java:112))(7/11) switched to DEPLOYING
> 08/01/2016 12:12:12 GroupReduce (GroupReduce at
> main(EnumTriangles.java:112))(6/11) switched to SCHEDULED
> 08/01/2016 12:12:12 GroupReduce (GroupReduce at
> main(EnumTriangles.java:112))(4/11) switched to SCHEDULED
> 08/01/2016 12:12:12 GroupReduce (GroupReduce at
> main(EnumTriangles.java:112))(5/11) switched to SCHEDULED
> 08/01/2016 12:12:12 GroupReduce (GroupReduce at
> main(EnumTriangles.java:112))(4/11) switched to DEPLOYING
> 08/01/2016 12:12:12 GroupReduce (GroupReduce at
> main(EnumTriangles.java:112))(3/11) switched to DEPLOYING
> 08/01/2016 12:12:12 GroupReduce (GroupReduce at
> main(EnumTriangles.java:112))(9/11) switched to SCHEDULED
> 08/01/2016 12:12:12 GroupReduce (GroupReduce at
> main(EnumTriangles.java:112))(9/11) switched to DEPLOYING
> 08/01/2016 12:12:12 GroupReduce (GroupReduce at
> main(EnumTriangles.java:112))(5/11) switched to DEPLOYING
> 08/01/2016 12:12:12 GroupReduce (GroupReduce at
> main(EnumTriangles.java:112))(1/11) switched to DEPLOYING
> 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(1/11)
> switched to SCHEDULED
> 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(1/11)
> switched to DEPLOYING
> 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(2/11)
> switched to SCHEDULED
> 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(2/11)
> switched to DEPLOYING
> 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(3/11)
> switched to SCHEDULED
> 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(3/11)
> switched to DEPLOYING
> 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(4/11)
> switched to SCHEDULED
> 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(4/11)
> switched to DEPLOYING
> 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(5/11)
> switched to SCHEDULED
> 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(5/11)
> switched to DEPLOYING
> 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(6/11)
> switched to SCHEDULED
> 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(6/11)
> switched to DEPLOYING
> 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(7/11)
> switched to SCHEDULED
> 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(7/11)
> switched to DEPLOYING
> 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(8/11)
> switched to SCHEDULED
> 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(8/11)
> switched to DEPLOYING
> 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(9/11)
> switched to SCHEDULED
> 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(9/11)
> switched to DEPLOYING
> 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(10/11)
> switched to SCHEDULED
> 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(10/11)
> switched to DEPLOYING
> 08/01/2016 12:12:12 GroupReduce (GroupReduce at
> main(EnumTriangles.java:112))(11/11) switched to SCHEDULED
> 08/01/2016 12:12:12 GroupReduce (GroupReduce at
> main(EnumTriangles.java:112))(10/11) switched to SCHEDULED
> 08/01/2016 12:12:12 GroupReduce (GroupReduce at
> main(EnumTriangles.java:112))(11/11) switched to DEPLOYING
> 08/01/2016 12:12:12 GroupReduce (GroupReduce at
> main(EnumTriangles.java:112))(10/11) switched to DEPLOYING
> 08/01/2016 12:12:12 GroupReduce (GroupReduce at
> main(EnumTriangles.java:112))(8/11) switched to SCHEDULED
> 08/01/2016 12:12:12 GroupReduce (GroupReduce at
> main(EnumTriangles.java:112))(6/11) switched to DEPLOYING
> 08/01/2016 12:12:12 GroupReduce (GroupReduce at
> main(EnumTriangles.java:112))(2/11) switched to DEPLOYING
> 08/01/2016 12:12:12 GroupReduce (GroupReduce at
> main(EnumTriangles.java:112))(3/11) switched to RUNNING
> 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(11/11)
> switched to SCHEDULED
> 08/01/2016 12:12:12 GroupReduce (GroupReduce at
> main(EnumTriangles.java:112))(1/11) switched to RUNNING
> 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(1/11)
> switched to RUNNING
> 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(2/11)
> switched to RUNNING
> 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(3/11)
> switched to RUNNING
> 08/01/2016 12:12:12 GroupReduce (GroupReduce at
> main(EnumTriangles.java:112))(9/11) switched to RUNNING
> 08/01/2016 12:12:12 GroupReduce (GroupReduce at
> main(EnumTriangles.java:112))(4/11) switched to RUNNING
> 08/01/2016 12:12:12 GroupReduce (GroupReduce at
> main(EnumTriangles.java:112))(5/11) switched to RUNNING
> 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(7/11)
> switched to RUNNING
> 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(6/11)
> switched to RUNNING
> 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(8/11)
> switched to RUNNING
> 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(9/11)
> switched to RUNNING
> 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(10/11)
> switched to RUNNING
> 08/01/2016 12:12:12 GroupReduce (GroupReduce at
> main(EnumTriangles.java:112))(10/11) switched to RUNNING
> 08/01/2016 12:12:12 GroupReduce (GroupReduce at
> main(EnumTriangles.java:112))(11/11) switched to RUNNING
> 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(4/11)
> switched to RUNNING
> 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(5/11)
> switched to RUNNING
> 08/01/2016 12:12:12 GroupReduce (GroupReduce at
> main(EnumTriangles.java:112))(7/11) switched to RUNNING
> 08/01/2016 12:12:12 GroupReduce (GroupReduce at
> main(EnumTriangles.java:112))(2/11) switched to RUNNING
> 08/01/2016 12:12:12 GroupReduce (GroupReduce at
> main(EnumTriangles.java:112))(6/11) switched to RUNNING
> 08/01/2016 12:12:13 GroupReduce (GroupReduce at
> main(EnumTriangles.java:112))(1/11) switched to FINISHED
> 08/01/2016 12:12:13 GroupReduce (GroupReduce at
> main(EnumTriangles.java:112))(2/11) switched to FINISHED
> 08/01/2016 12:12:13 GroupReduce (GroupReduce at
> main(EnumTriangles.java:112))(7/11) switched to FINISHED
> 08/01/2016 12:12:13 GroupReduce (GroupReduce at
> main(EnumTriangles.java:112))(6/11) switched to FINISHED
> 08/01/2016 12:12:13 GroupReduce (GroupReduce at
> main(EnumTriangles.java:112))(3/11) switched to FINISHED
> 08/01/2016 12:12:13 GroupReduce (GroupReduce at
> main(EnumTriangles.java:112))(9/11) switched to FINISHED
> 08/01/2016 12:12:13 GroupReduce (GroupReduce at
> main(EnumTriangles.java:112))(11/11) switched to FINISHED
> 08/01/2016 12:12:13 GroupReduce (GroupReduce at
> main(EnumTriangles.java:112))(5/11) switched to FINISHED
> 08/01/2016 12:12:13 GroupReduce (GroupReduce at
> main(EnumTriangles.java:112))(10/11) switched to FINISHED
> 08/01/2016 12:12:13 GroupReduce (GroupReduce at
> main(EnumTriangles.java:112))(4/11) switched to FINISHED
> {noformat}
> For {{8/11}}, the {{Join}} task switches to RUNNING, but the {{GroupReduce}}
> does not:
> {noformat}
> 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(8/11)
> switched to SCHEDULED
> 08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(8/11)
> switched to DEPLOYING
> ....
> 08/01/2016 12:12:12 GroupReduce (GroupReduce at
> main(EnumTriangles.java:112))(8/11) switched to SCHEDULED
> ....
> {08/01/2016 12:12:12 Join(Join at main(EnumTriangles.java:114))(8/11)
> switched to RUNNING}}
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)