[ 
https://issues.apache.org/jira/browse/FLINK-10945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16715277#comment-16715277
 ] 

ASF GitHub Bot commented on FLINK-10945:
----------------------------------------

azagrebin commented on a change in pull request #7255: [FLINK-10945] Use 
InputDependencyConstraint to avoid resource dead…
URL: https://github.com/apache/flink/pull/7255#discussion_r240310923
 
 

 ##########
 File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/Execution.java
 ##########
 @@ -747,31 +747,34 @@ else if (numConsumers == 0) {
                                
consumerVertex.cachePartitionInfo(PartialInputChannelDeploymentDescriptor.fromEdge(
                                                partition, partitionExecution));
 
-                               // When deploying a consuming task, its task 
deployment descriptor will contain all
-                               // deployment information available at the 
respective time. It is possible that some
-                               // of the partitions to be consumed have not 
been created yet. These are updated
-                               // runtime via the update messages.
-                               //
-                               // TODO The current approach may send many 
update messages even though the consuming
-                               // task has already been deployed with all 
necessary information. We have to check
-                               // whether this is a problem and fix it, if it 
is.
-                               CompletableFuture.supplyAsync(
-                                       () -> {
-                                               try {
-                                                       final ExecutionGraph 
executionGraph = consumerVertex.getExecutionGraph();
-                                                       
consumerVertex.scheduleForExecution(
-                                                               
executionGraph.getSlotProvider(),
-                                                               
executionGraph.isQueuedSchedulingAllowed(),
-                                                               
LocationPreferenceConstraint.ANY, // there must be at least one known location
-                                                               
Collections.emptySet());
-                                               } catch (Throwable t) {
-                                                       consumerVertex.fail(new 
IllegalStateException("Could not schedule consumer " +
+                               // Schedule the consumer vertex if its inputs 
constraint is satisfied, otherwise postpone the scheduling
+                               if 
(consumerVertex.checkInputDependencyConstraints()) {
+                                       // When deploying a consuming task, its 
task deployment descriptor will contain all
+                                       // deployment information available at 
the respective time. It is possible that some
+                                       // of the partitions to be consumed 
have not been created yet. These are updated
+                                       // runtime via the update messages.
+                                       //
+                                       // TODO The current approach may send 
many update messages even though the consuming
+                                       // task has already been deployed with 
all necessary information. We have to check
+                                       // whether this is a problem and fix 
it, if it is.
+                                       CompletableFuture.supplyAsync(
 
 Review comment:
   The body of the introduced if is quite big, could we move it into a separate 
method, like `scheduleConsumer`?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Avoid resource deadlocks for finite stream jobs when resources are limited
> --------------------------------------------------------------------------
>
>                 Key: FLINK-10945
>                 URL: https://issues.apache.org/jira/browse/FLINK-10945
>             Project: Flink
>          Issue Type: Improvement
>          Components: Distributed Coordination
>    Affects Versions: 1.7.1
>            Reporter: Zhu Zhu
>            Assignee: Zhu Zhu
>            Priority: Major
>              Labels: pull-request-available
>
> Currently *resource deadlocks* can happen to finite stream jobs(or batch 
> jobs) when resources are limited. In 2 cases as below:
>  # Task Y is a pipelined downstream task of task X. When X takes all 
> resources(slots), Y cannot acquire slots to start, thus the back pressure 
> will block X to finish
>  # Task Y is a upstream task of task X. When X takes all resources(slots) and 
> Y cannot start, X cannot finish as some of its inputs are not finished
>  
> We can avoid case 1 by setting all edges to be BLOCKING to avoid pipeline 
> back pressure. However, case 2 cannot be avoided as X(downstream task) will 
> be launched when any of its input result is ready.
> To be detailed, say task X has BLOCKING upstream task Y and Z, X can be 
> launched when Z finishes, though task Y is not launched yet. This pre-launch 
> behaviour can be beneficial when there are plenty of resources, thus X can 
> process data from Z earlier before Y finishes its data processing. However, 
> resource deadlocks may happen when the resources are limited, e.g. in small 
> sessions.
>  
> I’d propose introducing a constraint named as *InputDependencyConstraint* to 
> control the scheduling of vertices. It has 2 values:
>  # *ANY*. The vertex can be scheduled when any of its inputs is consumable.
>  # *ALL*. The vertex can be scheduled when all of its inputs are consumable.
>  
> The design doc is here. 
> [https://docs.google.com/document/d/1jpqC7OW_nLOSVOg06_QCWelicVtV6Au0Krg5m_S4kjY/edit?usp=sharing]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to