[GitHub] [flink] akalash commented on a change in pull request #16885: [FLINK-23862][runtime] Cleanup StreamTask if cancelled after restore …

GitBox Mon, 23 Aug 2021 04:22:17 -0700


akalash commented on a change in pull request #16885:
URL: https://github.com/apache/flink/pull/16885#discussion_r693877990




##########
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/taskmanager/Task.java
##########
@@ -760,33 +763,26 @@ private void doRun() {
             // make sure the user code classloader is accessible thread-locally
             
executingThread.setContextClassLoader(userCodeClassLoader.asClassLoader());
 
-            FlinkSecurityManager.monitorUserSystemExitForCurrentThread();
+            TaskInvokable finalInvokable = invokable;
             try {
-                // Restore invokable data to the last valid state
-                invokable.restore();
-            } finally {
-                FlinkSecurityManager.unmonitorUserSystemExitForCurrentThread();
-            }
+                runWithSystemExitMonitoring(finalInvokable::restore);
 
-            if (!transitionState(ExecutionState.INITIALIZING, 
ExecutionState.RUNNING)) {
-                throw new CancelTaskException();
-            }
+                if (!transitionState(ExecutionState.INITIALIZING, 
ExecutionState.RUNNING)) {
+                    throw new CancelTaskException();
+                }
 
-            // notify everyone that we switched to running
-            taskManagerActions.updateTaskExecutionState(
-                    new TaskExecutionState(executionId, 
ExecutionState.RUNNING));
+                // notify everyone that we switched to running
+                taskManagerActions.updateTaskExecutionState(
+                        new TaskExecutionState(executionId, 
ExecutionState.RUNNING));
 
-            // Monitor user codes from exiting JVM covering user function 
invocation. This can be
-            // done in a finer-grained way like enclosing user callback 
functions individually,
-            // but as exit triggered by framework is not performed and 
expected in this invoke
-            // function anyhow, we can monitor exiting JVM for entire scope.
-            FlinkSecurityManager.monitorUserSystemExitForCurrentThread();
-            try {
-                // run the invokable
-                invokable.invoke();
-            } finally {
-                FlinkSecurityManager.unmonitorUserSystemExitForCurrentThread();
+                runWithSystemExitMonitoring(finalInvokable::invoke);
+            } catch (Throwable throwable) {
+                if (!(throwable instanceof CancelTaskException)) {

Review comment:
       I don't sure about this condition. According to this code if 
CancelTaskException happens we just ignore it and continue the execution but 
the old implementation always throws exceptions.
   Right now, the logic is pretty confusing - if the problem happens inside of 
`restore` or `invoke` we cancel the task immediately in catch block but if it 
happens between these two calls(during the state transit) we will cancel the 
task a little later in the different catch block. I think it won't be wrong to 
unify this logic and cancel the task here whatever exception happened. WDYT?

##########
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/taskmanager/Task.java
##########
@@ -1378,11 +1389,11 @@ private void declineCheckpoint(
     }
 
     public void notifyCheckpointComplete(final long checkpointID) {
-        final AbstractInvokable invokable = this.invokable;
+        final TaskInvokable invokable = this.invokable;
 
-        if (executionState == ExecutionState.RUNNING && invokable != null) {
+        if (executionState == ExecutionState.RUNNING && invokable instanceof 
CheckpointableTask) {

Review comment:
       as I understand, it is impossible/unexpected that `invokable instanceof 
CheckpointableTask` would be false. So maybe is it better to replace the `if 
condition` with `checkArgument`?(The same question to others `instanceof`)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink] akalash commented on a change in pull request #16885: [FLINK-23862][runtime] Cleanup StreamTask if cancelled after restore …

Reply via email to