rkhachatryan commented on code in PR #21923:
URL: https://github.com/apache/flink/pull/21923#discussion_r1124206563
##########
flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java:
##########
@@ -733,8 +764,7 @@ void restoreInternal() throws Exception {
// needed
channelIOExecutor.shutdown();
- isRunning = true;
- isRestoring = false;
+ taskState.status = TaskState.Status.RUNNING;
Review Comment:
Hey Panos,
> the [invokable
method](https://github.com/apache/flink/pull/21923/files#diff-0c5fe245445b932fa83fdaf5c4802dbe671b73e8d76f39058c6eaaaffd9639faL982-L983)
that simultaneously marks the task as not Running and Canceled
Yes, but after that, the Task Thread might update `isRunning`, but not
`cancelled`.
1. Task Thread enters `restoreInternal` and passes `ensureNotCanceled`
2. Canceller Thread calls `invokable.cancel` and sets `isRunning = false`,
`canceled = true`
3. Task Thread continues and sets `isRunning = true`, `isRestoring = false`
4. Legacy Source Thread gets interrupted and checks `isCancelled` - it's
`true`, so `CancelTaskException` is thrown, which won't fail the job
With this PR, all the flags are combined and updated unconditionally:
1. Task Thread enters `restoreInternal` and passes `ensureNotCanceled`
2. Canceller Thread calls `invokable.cancel` and sets `taskState.status =
TaskState.Status.CANCELED`
3. Task Thread continues and sets `taskState.status =
TaskState.Status.RUNNING`
4. Legacy Source Thread gets interrupted and checks `isCancelled`, i.e.
`taskState.status == TaskState.Status.CANCELED`; which is `false`, so some
other `Throwable` is thrown, which **does** fail the job
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]