zhuzhurk commented on a change in pull request #9902: [FLINK-14363][runtime] Prevent vertex from being affected by outdated deployment URL: https://github.com/apache/flink/pull/9902#discussion_r338371293
########## File path: flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/DefaultScheduler.java ########## @@ -412,13 +412,17 @@ private static Throwable maybeWrapWithNoResourceAvailableException(final Throwab }; } - private void stopDeployment(final DeploymentHandle deploymentHandle) { - cancelExecutionVertex(deploymentHandle.getExecutionVertexId()); + private void releaseUnassignedSlotIfPresent(final DeploymentHandle deploymentHandle) { // Canceling the vertex normally releases the slot. However, we might not have assigned // the slot to the vertex yet. + // Only release unassigned slot to guarantee no vertex state change happens here. deploymentHandle .getLogicalSlot() - .ifPresent(logicalSlot -> logicalSlot.releaseSlot(null)); + .ifPresent(logicalSlot -> { + if (logicalSlot.getPayload() != null) { Review comment: I gave some more thoughts on it and thinks we can even remove this release logic since an unassigned slot will never get released here. - if a slot is assigned with a payload (iff it is assigned to an execution or is released), there's no need to release it here - if a slot is not assigned and needs a release - a) in `assignResourceOrHandleError`'s outdated deployment handling block. It's not possible to happen because the vertex is outdated iff it has been restarted and the slot request will be canceled then - b) in `deployOrHandleError`'s outdated deployment handling block. This happens iff the preceding `assignResourceOrHandleError` of the same vertex is done without a successful assigning, which means - b.1) case *a)* happened but it's not possible to happen - b.2) unexpected error happened in `assignResourceOrHandleError`. But `deployOrHandleError` can not be invoked in this case since `deployAll` will propagate the unexpected error to force a JM restart (ignoring `deployIndividually` which would be removed in FLINK-14162, see discussion [here](https://github.com/apache/flink/pull/9860#discussion_r334367314) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services