StephanEwen commented on pull request #14259:
URL: https://github.com/apache/flink/pull/14259#issuecomment-735877076


   @tillrohrmann That is a very good observation, with the checkpoint versus 
savepoint issue.
   
   I think what we need to do is the following: Whenever we go back to 
something other than the latest checkpoint (which would be the case if latest 
is savepoint and we go to a checkpoint), we need to trigger a global failover, 
so all tasks go back equally, not just some of them go back (regional failover) 
and others stay ahead.
   
   In that case we should be consistent here as well, because then all tasks 
and the coordinator consistently reset to the same state of the earlier 
checkpoint, where the split is still on the coordinator. The problem is only if 
the tasks go back to an earlier state (split not yet assigned) but the 
coordinator does not go back (retains assumption split was assigned).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to