XComp commented on pull request #14798:
URL: https://github.com/apache/flink/pull/14798#issuecomment-779102405


   I looked through the code (supported by @AHeise): The race condition really 
only kicks in when cancelling/failing the task because that's when the 
`failureCause` becomes relevant. So, instead of synchronizing the [state 
transition](https://github.com/XComp/flink/blob/5781449f38c1e36c1a2952518f9e30761d915f04/flink-runtime/src/main/java/org/apache/flink/runtime/taskmanager/Task.java#L1052)
 we could add synchronize blocks for the [cancellation of a 
task](https://github.com/XComp/flink/blob/5781449f38c1e36c1a2952518f9e30761d915f04/flink-runtime/src/main/java/org/apache/flink/runtime/taskmanager/Task.java#L1130)
 and while [handling failure handling during the normal Invokable 
execution](https://github.com/XComp/flink/blob/5781449f38c1e36c1a2952518f9e30761d915f04/flink-runtime/src/main/java/org/apache/flink/runtime/taskmanager/Task.java#L828).
 The synchronization will only cover the state transition and setting the 
`failureCause`. Cancelling the corresponding task would be moved out of 
 the synchronization block.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to