StephanEwen commented on pull request #14259: URL: https://github.com/apache/flink/pull/14259#issuecomment-735877076
@tillrohrmann That is a very good observation, with the checkpoint versus savepoint issue. I think what we need to do is the following: Whenever we go back to something other than the latest checkpoint (which would be the case if latest is savepoint and we go to a checkpoint), we need to trigger a global failover, so all tasks go back equally, not just some of them go back (regional failover) and others stay ahead. In that case we should be consistent here as well, because then all tasks and the coordinator consistently reset to the same state of the earlier checkpoint, where the split is still on the coordinator. The problem is only if the tasks go back to an earlier state (split not yet assigned) but the coordinator does not go back (retains assumption split was assigned). ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
