[
https://issues.apache.org/jira/browse/FLINK-10945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16717254#comment-16717254
]
ASF GitHub Bot commented on FLINK-10945:
----------------------------------------
zhuzhurk commented on a change in pull request #7255: [FLINK-10945] Use
InputDependencyConstraint to avoid resource dead…
URL: https://github.com/apache/flink/pull/7255#discussion_r240629262
##########
File path:
flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/Execution.java
##########
@@ -747,31 +747,34 @@ else if (numConsumers == 0) {
consumerVertex.cachePartitionInfo(PartialInputChannelDeploymentDescriptor.fromEdge(
partition, partitionExecution));
- // When deploying a consuming task, its task
deployment descriptor will contain all
- // deployment information available at the
respective time. It is possible that some
- // of the partitions to be consumed have not
been created yet. These are updated
- // runtime via the update messages.
- //
- // TODO The current approach may send many
update messages even though the consuming
- // task has already been deployed with all
necessary information. We have to check
- // whether this is a problem and fix it, if it
is.
- CompletableFuture.supplyAsync(
- () -> {
- try {
- final ExecutionGraph
executionGraph = consumerVertex.getExecutionGraph();
-
consumerVertex.scheduleForExecution(
-
executionGraph.getSlotProvider(),
-
executionGraph.isQueuedSchedulingAllowed(),
-
LocationPreferenceConstraint.ANY, // there must be at least one known location
-
Collections.emptySet());
- } catch (Throwable t) {
- consumerVertex.fail(new
IllegalStateException("Could not schedule consumer " +
+ // Schedule the consumer vertex if its inputs
constraint is satisfied, otherwise postpone the scheduling
+ if
(consumerVertex.checkInputDependencyConstraints()) {
+ // When deploying a consuming task, its
task deployment descriptor will contain all
+ // deployment information available at
the respective time. It is possible that some
+ // of the partitions to be consumed
have not been created yet. These are updated
+ // runtime via the update messages.
+ //
+ // TODO The current approach may send
many update messages even though the consuming
+ // task has already been deployed with
all necessary information. We have to check
+ // whether this is a problem and fix
it, if it is.
+ CompletableFuture.supplyAsync(
Review comment:
Sure.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Avoid resource deadlocks for finite stream jobs when resources are limited
> --------------------------------------------------------------------------
>
> Key: FLINK-10945
> URL: https://issues.apache.org/jira/browse/FLINK-10945
> Project: Flink
> Issue Type: Improvement
> Components: Distributed Coordination
> Affects Versions: 1.7.1
> Reporter: Zhu Zhu
> Assignee: Zhu Zhu
> Priority: Major
> Labels: pull-request-available
>
> Currently *resource deadlocks* can happen to finite stream jobs(or batch
> jobs) when resources are limited. In 2 cases as below:
> # Task Y is a pipelined downstream task of task X. When X takes all
> resources(slots), Y cannot acquire slots to start, thus the back pressure
> will block X to finish
> # Task Y is a upstream task of task X. When X takes all resources(slots) and
> Y cannot start, X cannot finish as some of its inputs are not finished
>
> We can avoid case 1 by setting all edges to be BLOCKING to avoid pipeline
> back pressure. However, case 2 cannot be avoided as X(downstream task) will
> be launched when any of its input result is ready.
> To be detailed, say task X has BLOCKING upstream task Y and Z, X can be
> launched when Z finishes, though task Y is not launched yet. This pre-launch
> behaviour can be beneficial when there are plenty of resources, thus X can
> process data from Z earlier before Y finishes its data processing. However,
> resource deadlocks may happen when the resources are limited, e.g. in small
> sessions.
>
> I’d propose introducing a constraint named as *InputDependencyConstraint* to
> control the scheduling of vertices. It has 2 values:
> # *ANY*. The vertex can be scheduled when any of its inputs is consumable.
> # *ALL*. The vertex can be scheduled when all of its inputs are consumable.
>
> The design doc is here.
> [https://docs.google.com/document/d/1jpqC7OW_nLOSVOg06_QCWelicVtV6Au0Krg5m_S4kjY/edit?usp=sharing]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)