Stephan Ewen created FLINK-1953:
-----------------------------------
Summary: Rework Checkpoint Coordinator
Key: FLINK-1953
URL: https://issues.apache.org/jira/browse/FLINK-1953
Project: Flink
Issue Type: Bug
Components: Streaming
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Stephan Ewen
Fix For: 0.9
The checkpoint coordinator currently contains no tests and is vulnerable to a
variety of situations. In particular, I propose to add:
- Better configurability which tasks receive the trigger checkpoint messages,
which tasks need to acknowledge the checkpoint, and which tasks need to receive
confirmation messages.
- checkpoint timeouts, such that incomplete checkpoints are guaranteed to be
cleaned up after a while, regardless of successful checkpoints
- better sanity checking of messages and fields, to properly handle/ignore
messages for old/expired checkpoints, or invalidly routed messages
- Better handling of checkpoint attempts at points where the execution has
just failed is is currently being canceled.
- Add a good set of tests
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)