[
https://issues.apache.org/jira/browse/FLINK-1953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527536#comment-14527536
]
ASF GitHub Bot commented on FLINK-1953:
---------------------------------------
GitHub user StephanEwen opened a pull request:
https://github.com/apache/flink/pull/651
[FLINK-1953] [runtime] Integrate new snapshot checkpoint coordinator with
jobgraph and execution graph
The core commit is
https://github.com/apache/flink/commit/abd5ac7d78c5231e95bbbaaf15dad8f8c83221f9,
This builds on top of the reworked Task from #648
Also adds a bunch of unit tests.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/StephanEwen/incubator-flink checkpointing
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/651.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #651
----
commit 60d1e141d625f4e431e9cda7a2fc246a25d8816a
Author: Stephan Ewen <[email protected]>
Date: 2015-04-30T20:05:27Z
[streaming] New Source and state checkpointing interfaces that allow
operations to interact with the state checkpointing in a more precise manner.
commit 04969b380cdafaa1d63dbb2740b6092d237dbaab
Author: Stephan Ewen <[email protected]>
Date: 2015-05-02T23:15:39Z
[FLINK-1968] [runtime] Clean up and improve the distributed cache.
- Gives a proper exception when a non-cached file is accessed
- Forwards I/O exceptions that happen during file transfer, rather than
only returning null when transfer failed
- Consistently keeps reference counts and copies only when needed
- Properly removes all files when shutdown
- Uses a shutdown hook to remove files when process is killed
commit bba8504c125c1f81c448cb2d4a6fbad7e79f4e7e
Author: Stephan Ewen <[email protected]>
Date: 2015-05-02T23:57:37Z
[runtime] Fix TaskExecutionState against non-serializable exceptions.
commit 9b9594a7569ed01dcfc97a82880938c033171bec
Author: Stephan Ewen <[email protected]>
Date: 2015-05-03T02:41:03Z
[FLINK-1672] [runtime] Unify Task and RuntimeEnvironment into one class.
- This simplifies and hardens the failure handling during task startup
- Guarantees that no actor system threads are blocked by task bootstrap,
or task canceling
- Corrects some previously erroneous corner case state transitions
- Adds simple and robust tests
commit 3e4ed4e9e6492fa2d06892dc42c491125f32ad98
Author: Stephan Ewen <[email protected]>
Date: 2015-05-03T12:10:35Z
[FLINK-1969] [runtime] Remove deprecated profiler code
commit 5da3a5d5b19414ef794e8e6f8e6a3c77c613ffce
Author: Stephan Ewen <[email protected]>
Date: 2015-05-03T12:10:58Z
Update build target path in README.md
commit abd5ac7d78c5231e95bbbaaf15dad8f8c83221f9
Author: Stephan Ewen <[email protected]>
Date: 2015-04-30T17:59:36Z
[FLINK-1953] [runtime] Integrate new snapshot checkpoint coordinator with
jobgraph and execution graph
commit ef3fd5de4fa414d41e451892219a6716ada3c036
Author: Stephan Ewen <[email protected]>
Date: 2015-05-04T22:26:05Z
[FLINK-1973] [jobmanager] Task execution state messages are logged on INFO
level, rather than on DEBUG level
----
> Rework Checkpoint Coordinator
> -----------------------------
>
> Key: FLINK-1953
> URL: https://issues.apache.org/jira/browse/FLINK-1953
> Project: Flink
> Issue Type: Bug
> Components: Streaming
> Affects Versions: 0.9
> Reporter: Stephan Ewen
> Assignee: Stephan Ewen
> Fix For: 0.9
>
>
> The checkpoint coordinator currently contains no tests and is vulnerable to a
> variety of situations. In particular, I propose to add:
> - Better configurability which tasks receive the trigger checkpoint
> messages, which tasks need to acknowledge the checkpoint, and which tasks
> need to receive confirmation messages.
> - checkpoint timeouts, such that incomplete checkpoints are guaranteed to be
> cleaned up after a while, regardless of successful checkpoints
> - better sanity checking of messages and fields, to properly handle/ignore
> messages for old/expired checkpoints, or invalidly routed messages
> - Better handling of checkpoint attempts at points where the execution has
> just failed is is currently being canceled.
> - Add a good set of tests
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)