Github user senorcarbone commented on the pull request: https://github.com/apache/flink/pull/1537#issuecomment-174500162 Looks cool. Just so I understand exactly, what is wrong again if the Coordinator simply aborts expired checkpoint attempts? Wouldn't the protocol be the same, with less messages? If a task is not ready it can simply discard the checkpoint request which will eventually time out at the Coordinator. The Coordinator attempts might potentially keep timing out but there will be a complete snapshot eventually when all tasks are ready.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---