zhuzhurk commented on a change in pull request #9902: [FLINK-14363][runtime] 
Prevent vertex from being affected by outdated deployment
URL: https://github.com/apache/flink/pull/9902#discussion_r338371293
 
 

 ##########
 File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/DefaultScheduler.java
 ##########
 @@ -412,13 +412,17 @@ private static Throwable 
maybeWrapWithNoResourceAvailableException(final Throwab
                };
        }
 
-       private void stopDeployment(final DeploymentHandle deploymentHandle) {
-               cancelExecutionVertex(deploymentHandle.getExecutionVertexId());
+       private void releaseUnassignedSlotIfPresent(final DeploymentHandle 
deploymentHandle) {
                // Canceling the vertex normally releases the slot. However, we 
might not have assigned
                // the slot to the vertex yet.
+               // Only release unassigned slot to guarantee no vertex state 
change happens here.
                deploymentHandle
                        .getLogicalSlot()
-                       .ifPresent(logicalSlot -> 
logicalSlot.releaseSlot(null));
+                       .ifPresent(logicalSlot -> {
+                               if (logicalSlot.getPayload() != null) {
 
 Review comment:
   I gave some more thoughts on it and thinks we can even remove this release 
logic since an unassigned slot will never get released here.
   - if a slot is assigned with a payload (iff it is assigned to an execution 
or is released), there's no need to release it here
   - if a slot is not assigned and needs a release 
    - a) in `assignResourceOrHandleError`'s outdated deployment handling block. 
It's not possible to happen because the vertex is outdated iff it has been 
restarted and the slot request will be canceled then
    - b) in `deployOrHandleError`'s outdated deployment handling block. This 
happens iff the preceding `assignResourceOrHandleError` of the same vertex is 
done without a successful assigning, which means 
      - b.1) case *a)* happened but it's not possible to happen
      - b.2) unexpected error happened in `assignResourceOrHandleError`. But 
`deployOrHandleError` can not be invoked in this case since `deployAll` will 
propagate the unexpected error to force a JM restart (ignoring 
`deployIndividually` which would be removed in FLINK-14162, see discussion 
[here](https://github.com/apache/flink/pull/9860#discussion_r334367314)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to